Flag field update and insert vs all columns update vs delete and insert

  • Hi All

    I have a data warehousing scenario related to Updates and SCD.

    In our ETL procedure, we currently update all the columns in a Row for an existing Record which has undergone a change . This change can have happened to atleast 1 or atmost all of the columns.

    We are discussing the below options to reduce / eliminate updates :

    1. Mark a bit flag for the existing row to mark it inactive and insert a new row with the changed data.

    -- is there a difference in performance if we update a single bit column vs updating all columns ?

    2. Delete the existing row and insert a new row.

    Which one of the above 2 will give a better performance ?

    Since its DW , we are talking about millions of records in the table.

    Thanks

    Koustav

  • Hi,

    What you're currently doing is called a Type 1 update. This is the most easy of all. Although, instead of updating all columns you can look for only the columns that have changed and update them. Something like the following merge statement:

    Merge trgTable trg

    using srcTable src

    ON trg.JoinCol = src.JoinCol

    WHEN NOT matched BY target THEN INSERT (ColA, ColB) VALUES (ColA, ColB)

    WHEN matched AND (trg.ColA <> src.ColA OR trg.ColB <> src.ColB)

    THEN UPDATE SET trg.ColA = src.ColA, trg.ColB = src.ColB

    Your second option: deleting all rows and reloading is practically impossible for larger database. You have reload the dimensions as well as fact tables.This is called full load and takes a lot of time.

    Your first option is called Type 2 change. Irrespective of performance, this is ideal for data warehouse situation.

    Below links might help you.

    http://www.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/62063/

    http://svangasql.wordpress.com/2011/12/20/difference-between-full-load-and-incremental-load/

    I think it all depends on your business requirements. If you're not required to maintain history, continue to do what you are currently doing, may be use the query i gave above. If you have to maintain history, you don't have an option but to use your first option.

    Sorry if the answer is too basic knowledge for you!

  • Thanks for your reply Sam . However I do want to know if there is a performance difference between updating all columns and updating 1 single column in a table using SQl Server .

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply