Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Flag field update and insert vs all columns update vs delete and insert Expand / Collapse
Author
Message
Posted Monday, October 15, 2012 12:10 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, November 6, 2014 9:29 PM
Points: 72, Visits: 184
Hi All

I have a data warehousing scenario related to Updates and SCD.

In our ETL procedure, we currently update all the columns in a Row for an existing Record which has undergone a change . This change can have happened to atleast 1 or atmost all of the columns.

We are discussing the below options to reduce / eliminate updates :

1. Mark a bit flag for the existing row to mark it inactive and insert a new row with the changed data.
-- is there a difference in performance if we update a single bit column vs updating all columns ?

2. Delete the existing row and insert a new row.

Which one of the above 2 will give a better performance ?
Since its DW , we are talking about millions of records in the table.


Thanks
Koustav
Post #1372869
Posted Monday, October 15, 2012 2:09 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: 2 days ago @ 1:37 PM
Points: 43, Visits: 336
Hi,

What you're currently doing is called a Type 1 update. This is the most easy of all. Although, instead of updating all columns you can look for only the columns that have changed and update them. Something like the following merge statement:

Merge trgTable trg
using srcTable src
ON trg.JoinCol = src.JoinCol
WHEN NOT matched BY target THEN INSERT (ColA, ColB) VALUES (ColA, ColB)
WHEN matched AND (trg.ColA <> src.ColA OR trg.ColB <> src.ColB)
THEN UPDATE SET trg.ColA = src.ColA, trg.ColB = src.ColB

Your second option: deleting all rows and reloading is practically impossible for larger database. You have reload the dimensions as well as fact tables.This is called full load and takes a lot of time.

Your first option is called Type 2 change. Irrespective of performance, this is ideal for data warehouse situation.

Below links might help you.
http://www.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/62063/
http://svangasql.wordpress.com/2011/12/20/difference-between-full-load-and-incremental-load/

I think it all depends on your business requirements. If you're not required to maintain history, continue to do what you are currently doing, may be use the query i gave above. If you have to maintain history, you don't have an option but to use your first option.

Sorry if the answer is too basic knowledge for you!


~Sam.

http://svangasql.wordpress.com
Post #1372931
Posted Monday, October 15, 2012 9:04 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, November 6, 2014 9:29 PM
Points: 72, Visits: 184
Thanks for your reply Sam . However I do want to know if there is a performance difference between updating all columns and updating 1 single column in a table using SQl Server .
Post #1373017
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse