update target table when there is any changes in source table columns

  • Good Morning,

    I have a question regarding Update statement between two tables (stage_Table, Final_Table).

    • Here Stage_Table is daily refreshes (truncate and load from feed)

      Fianl_table contains full data that includes everyday new records and changes over time

    • here in final_table we have two columns that are part of Primary key (Col1, Col2)

    in the both tables we have same columns for example col1, col2, col3,..... col20

    so here my question is, I need to update exist records column values in final_table everyday from stage_table.

    only update final_table column values in below two scenarios

    if the stage has null and final table has value then no updates will happen to that column/field.

    how to check if there is any updates in Intake table columns col3 to col 10

    -- for the above two scenarios i have used below query is there any other better ways to do it? please advise



    FT.col3 = CASE WHEN S.col3 is not null or S.col3 <>'' then S.col3 Else FT.Col3 END,

    FT.col4 = S.col4,

    FT.col5 = S.col5,

    FT.col6 = S.col6,

    FT.col7 = S.col7,

    FT.col8 = S.col8,

    FT.col9 = S.col9,

    FT.col10 = S.col10


    STAGE_TABLE S on FT.Col1 = S.col1

    and FT.COL2 = s.Col2


    FT.col3 <> S.col3 OR

    FT.col4 <> S.col4 OR

    FT.col5 <> S.col5 OR

    FT.col6 <> S.col6 OR

    FT.col7 <> S.col7 OR

    FT.col8 <> S.col8 OR

    FT.col9 <> S.col9 OR

    FT.col10 <> S.col10

    Thank you


  • This was removed by the editor as SPAM

  • Why not go crazy and use a WHERE clause to find the records that are different and only update those?

  • This sounds like a prime example to use the HASHBYTES function where you hash all of the columns in each table and join on the key and where it is different, update the destination table. Sorry, I don't have a specific example of how to do it but hopefully a quick Google could help you out.

  • Possibly a dumb question but do you need those WHERE clauses?  If the data is the same, setting the value "10" to "10" is not going to hurt too badly if the table isn't HUGE and if there are not a lot of indexes.  The only cases you want to be careful of are your first condition - if staging has NULL then you want to keep the final value, but that can be handled by CASE conditions.

    Now, if that is billions of rows OR a lot of nonclustered indexes, you will benefit from only updating a subset of the data in which case those WHERE clauses are beneficial.  Or if you have a calculated column or trigger keeping track of a modified date, then you will want to make sure you only update rows that have changed.

    I think the other things to know is what problem are you trying to solve?  What I am thinking is if you are trying to reduce the time that the "Final Table" is unusable, reducing the dataset in the Staging Table that will get written to the Final Table would be the approach I would take.  What I am meaning is do all of the work in the Staging Table so you have a smaller data set to push across to the final table.  You could add a column to the Staging Table to track data status and a data load datestamp.  First, remove the truncate step.  Then, when you are pulling the data from the feed into the staging table, the status would be unprocessed.  Once the data has been pulled in, you look through the staging table ONLY for data that changed from the previous run which can be determined using the data load datestamp field.  From here you update the status to be data changed or no change depending on the logic that determines what should be pulled across to the final table.  You could even handle the NULL cases by updating the value in the staging table so that if it is null, it becomes the previous value.  Then you can remove the previous load data as it is no longer needed (unless you want it to review the process, but it is going to make the staging table grow pretty quickly if you don't remove historical data) and push across any data where the data has changed to the final table. Advantage to this approach is that the Final Table is going to have reduced blocking on it and your UPDATE statement can have the WHERE clause logic reduced to a single WHERE condition - where status is data changed.  Disadvantage is that the overall process may be slower and you will have more blocking on the staging table and you will have more disk being used long term.

    I would make the Status field a TINYINT datatype, and you may want more statuses in the future if logic needs to be applied differently when moving to the final table.  But my approach with ETL is to do all the work in the staging table so I have only a small bit of logic and as small of a data set as possible to work with to push across to the final table.  It MAY put more strain on the SQL instance (or SSIS), but it means more uptime for my final tables.

    Now, if you have a downtime window, and you are not exceeding that window, there may not be much benefit in working to improve the process.  If you do have a lot of nonclustered indexes, you may benefit from disabling and rebuilding those after your data loads.

    Just my 2 cents!

  • Hello David.


    Can you please provide any example with this


    Thank you


  • Hello Brian,


    This table has data about 200K and 20 columns. daily changed may be around 100 new records and changes about 100 to 200 records for few columns.


    there is only one primary key on the table nothing else in terms of indexes.


    I do have UpdatedDate in the FInal_Table which get updates during the real update(non null changes between stage and final table) happened.


    Can you please advise query for it.


    Thank you


    Thank you


  • Hello Linden.


    Sorry can you please provide some sample what you mean by use where clause?


    Thank you


Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply