Thursday, March 24, 2016

QUICKTIPS 2 - DELETE or CREATE+INSERT?

Sponsored by PRISE Ltd.
www.prisetools.com

Delete or Insert? Which one is faster?

The question 

If I got to delete rows from a table, practically I have two options:

  • Issue a DELETE command with the desired WHERE / JOIN condition
  • Create an empty table with the same structure and INSERT the remaining records using the negated WHERE condition and the rename tables
The first one is simple and straightforward, the second is complicated and will require more perm space. So it is a question?

Yes, it is. Because of the backgrounds of the execution:
  • DELETE
    It will modify each blocks that contain record(s) to be deleted. It has to write Transient Journal (transaction log) to enable transaction related functionality. In addition those data blocks which fill-up-rate falls below a threshold (eg. 50%) they will be merged. All these may cause really lot of I/O overhead. In some cases the deletion of the rows will happen in "random-like" order therefore the I/O access will be random, which is painfully slow.
  • INSERT
    In this case it is quite simple: we insert into an empty table, which causes almost zero Transient Journal data, and the system has to write only that amount of data blocks, what can store the records that will remain, once each block. An additional advantage: the I/O will always be "linear" instead of "random-like" therefore it can be multi-block operation, which is pretty fast.
Shall we forget about the delete?

The answer

No, absolutely not. DELETE is good enough if the amount of the records that must be deleted is not that much.
We have two factors that will help us in the decision:

  • How many records to be deleted?
  • How many records will remain after deletion?

I do not have well defined universal thresholds where it is worth to choose this or that, it depends.
I used to use the CREATE+INSERT+RENAME method if

  • The table is big enough (> > 0,1-0,5Gbytes or 10-50k rows / AMP)
    and
    I got to delete more than 20..40% of the records

  • OR  
  • The table has high hash collision (HC) rate on the PI
    and
    The table is MULTISET*
    and
    The delete is a JOIN-ed one
So it depends, got to measure. PRISE Tuning Assistant can help you to build up your own rule of thumb, applicable on your environment, by showing the resource consumption in pretty details.


* If the HC rate is high and the table is SET then the INSERT will be very slow because of the duplication check method

Sponsored by PRISE Ltd.
www.prisetools.com

Thursday, March 10, 2016

QUICKTIPS - 1. DELETE+INSERT - together or separetely?

Sponsored by PRISE Ltd.
www.prisetools.com

Shall I run DELETE and INSERT separately?

We often have permanent or temporary tables in our data warehouse which must be purged before they are populated again. We can do the operation within a transaction or separately. It is not the same...

Options - (example)

The examples assume Teradata (BTET)  mode, which is Auto-commit by default

  1. Together: both statements run in the same transaction

    1. Explicit transaction

      begin transaction;

      delete from TABLE1;

      insert into TABLE1 SELECT .....;

      end transaction;

    2. Multi-statement

      delete from TABLE1
      ;insert into TABLE1 SELECT .....;

  2. Separately: the statements run in separate transaction

    delete from TABLE1;

    insert into TABLE1 SELECT .....;


(Please note if you run it in Teradata Administrator aka. WinDDI, this will be run in one explicit transaction, since the application will place a BT; before the script and an ET; after.)

What is the difference?

The result

If everything is OK, we will not experience any difference at the end. The same data will be found in TABLE1 after we finished each ways. The difference shows if the INSERT step fails eg. for running out of perm space:

  • In A. cases the all transaction will be rolled back, therefore we will see all the data in the table we had before the delete.
  • In B. case we will delete the data from the tab, independently the success of the INSERT phase. After a failed INSERT the table will be empty


We have another additional transient difference:

  • In A. case we will not "see" an empty table from another session in any point of time: until a point the table seems as a "before" version, and when the transaction finished it will switch to an "after" image for other sessions in no time.
  • In B. case when the DELETE finished, the table seems to be empty up to the moment of COMMITing the INSERT statement. So if the INSERT lasts 2 hours, others will face and empty TABLE1 for 2 hours.

So I do not recommend to use option B. if the "empty table" state is not valid.

The performance

The A. and B. options will be executed significantly different manner. The DELETE operation will know that all the data will be dropped, so internally it just administers the data blocks to be free instead of clearing their content record-by-record, BUT only if the statement is the only statement in the transaction! In this case only the data block administration must be logged (to be able to roll back for some reason) instead of every block changes. This results far less Transient Journal activity besides less block I/Os.

Let's see some measurements.
I've put 10M records into a table, and then executed the same DELETE + INSERT statement on it. The results are the following:
  1. CPU:    9  sec I/O: 39K
  2. CPU  >0.1sec I/O:   4K
It is worth to mind the difference if you got to do it with billions of records.

Sponsored by PRISE Ltd.
www.prisetools.com