update target table when there is any changes in source table columns

  • Good Morning,

    I have a question regarding Update statement between two tables (stage_Table, Final_Table).

    • Here Stage_Table is daily refreshes (truncate and load from feed)

      Fianl_table contains full data that includes everyday new records and changes over time

    • here in final_table we have two columns that are part of Primary key (Col1, Col2)

    in the both tables we have same columns for example col1, col2, col3,..... col20

    so here my question is, I need to update exist records column values in final_table everyday from stage_table.

    only update final_table column values in below two scenarios

    if the stage has null and final table has value then no updates will happen to that column/field.

    how to check if there is any updates in Intake table columns col3 to col 10

    -- for the above two scenarios i have used below query is there any other better ways to do it? please advise

    UPDATE FT

    SET

    FT.col3 = CASE WHEN S.col3 is not null or S.col3 <>'' then S.col3 Else FT.Col3 END,

    FT.col4 = S.col4,

    FT.col5 = S.col5,

    FT.col6 = S.col6,

    FT.col7 = S.col7,

    FT.col8 = S.col8,

    FT.col9 = S.col9,

    FT.col10 = S.col10

    FROM FINAL_TABLE FT JOIN

    STAGE_TABLE S on FT.Col1 = S.col1

    and FT.COL2 = s.Col2

    WHERE

    FT.col3 <> S.col3 OR

    FT.col4 <> S.col4 OR

    FT.col5 <> S.col5 OR

    FT.col6 <> S.col6 OR

    FT.col7 <> S.col7 OR

    FT.col8 <> S.col8 OR

    FT.col9 <> S.col9 OR

    FT.col10 <> S.col10

    Thank you

    Asita

  • This was removed by the editor as SPAM

  • Why not go crazy and use a WHERE clause to find the records that are different and only update those?

  • This sounds like a prime example to use the HASHBYTES function where you hash all of the columns in each table and join on the key and where it is different, update the destination table. Sorry, I don't have a specific example of how to do it but hopefully a quick Google could help you out.

  • Possibly a dumb question but do you need those WHERE clauses?  If the data is the same, setting the value "10" to "10" is not going to hurt too badly if the table isn't HUGE and if there are not a lot of indexes.  The only cases you want to be careful of are your first condition - if staging has NULL then you want to keep the final value, but that can be handled by CASE conditions.

    Now, if that is billions of rows OR a lot of nonclustered indexes, you will benefit from only updating a subset of the data in which case those WHERE clauses are beneficial.  Or if you have a calculated column or trigger keeping track of a modified date, then you will want to make sure you only update rows that have changed.

    I think the other things to know is what problem are you trying to solve?  What I am thinking is if you are trying to reduce the time that the "Final Table" is unusable, reducing the dataset in the Staging Table that will get written to the Final Table would be the approach I would take.  What I am meaning is do all of the work in the Staging Table so you have a smaller data set to push across to the final table.  You could add a column to the Staging Table to track data status and a data load datestamp.  First, remove the truncate step.  Then, when you are pulling the data from the feed into the staging table, the status would be unprocessed.  Once the data has been pulled in, you look through the staging table ONLY for data that changed from the previous run which can be determined using the data load datestamp field.  From here you update the status to be data changed or no change depending on the logic that determines what should be pulled across to the final table.  You could even handle the NULL cases by updating the value in the staging table so that if it is null, it becomes the previous value.  Then you can remove the previous load data as it is no longer needed (unless you want it to review the process, but it is going to make the staging table grow pretty quickly if you don't remove historical data) and push across any data where the data has changed to the final table. Advantage to this approach is that the Final Table is going to have reduced blocking on it and your UPDATE statement can have the WHERE clause logic reduced to a single WHERE condition - where status is data changed.  Disadvantage is that the overall process may be slower and you will have more blocking on the staging table and you will have more disk being used long term.

    I would make the Status field a TINYINT datatype, and you may want more statuses in the future if logic needs to be applied differently when moving to the final table.  But my approach with ETL is to do all the work in the staging table so I have only a small bit of logic and as small of a data set as possible to work with to push across to the final table.  It MAY put more strain on the SQL instance (or SSIS), but it means more uptime for my final tables.

    Now, if you have a downtime window, and you are not exceeding that window, there may not be much benefit in working to improve the process.  If you do have a lot of nonclustered indexes, you may benefit from disabling and rebuilding those after your data loads.

    Just my 2 cents!

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Hello David.

     

    Can you please provide any example with this

     

    Thank you

    asita

  • Hello Brian,

     

    This table has data about 200K and 20 columns. daily changed may be around 100 new records and changes about 100 to 200 records for few columns.

     

    there is only one primary key on the table nothing else in terms of indexes.

     

    I do have UpdatedDate in the FInal_Table which get updates during the real update(non null changes between stage and final table) happened.

     

    Can you please advise query for it.

     

    Thank you

    Asita

    Thank you

    Asita

  • Hello Linden.

     

    Sorry can you please provide some sample what you mean by use where clause?

     

    Thank you

    ASita

  • The pattern we use for a similar process is this.

    MERGE INTO final_table
    USING (SELECT key_column_1
    , key_column_2
    , other_column_1
    ...
    , other_column_n
    FROM stage_table
    EXCEPT
    SELECT key_column_1
    , key_column_2
    , other_column_1
    ...
    , other_column_n
    FROM final_table
    ) as changes
    ON final_table.key_column_1 = changes.key_column_1
    AND final_table.key_column_2 = changes.key_column_2
    WHEN MATCHED THEN
    UPDATE
    SET other_column_1 = changes.other_column_1
    ...
    , other_column_n = changes.other_column_2
    WHEN NOT MATCHED THEN
    INSERT (key_column_1
    , key_column_2
    , other_column_1
    ...
    , other_column_n)
    VALUES (changes.key_column_1
    , changes.key_column_2
    , changes.other_column_1
    ...
    , changes.other_column_n);

    If you prefer, you can use a similar pattern for UPDATE/INSERT instead of MERGE.

    • This reply was modified 2 years, 9 months ago by  Chris Wooding.
  • Chris Wooding wrote:

    The pattern we use for a similar process is this.

    MERGE INTO final_table
    USING (SELECT key_column_1
    , key_column_2
    , other_column_1
    ...
    , other_column_n
    FROM stage_table
    EXCEPT
    SELECT key_column_1
    , key_column_2
    , other_column_1
    ...
    , other_column_n
    FROM final_table
    ) as changes
    ON final_table.key_column_1 = changes.key_column_1
    AND final_table.key_column_2 = changes.key_column_2
    WHEN MATCHED THEN
    UPDATE
    SET other_column_1 = changes.other_column_1
    ...
    , other_column_n = changes.other_column_2
    WHEN NOT MATCHED THEN
    INSERT (key_column_1
    , key_column_2
    , other_column_1
    ...
    , other_column_n)
    VALUES (changes.key_column_1
    , changes.key_column_2
    , changes.other_column_1
    ...
    , changes.other_column_n);

    If you prefer, you can use a similar pattern for UPDATE/INSERT instead of MERGE.

    I think you are missing the FROM statement in the second query - which needs to be the final table.  This avoids the need for using something like HASHBYTES to compare each row - and generates the subset of rows that will either need to be inserted or updated.

     

     

    • This reply was modified 2 years, 9 months ago by  Jeffrey Williams. Reason: Added FROM clause to be consistent with Chris's post

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks Jeffrey, I've corrected my sample code.

  • Chris Wooding wrote:

    Thanks Jeffrey, I've corrected my sample code.

    BTW - I like this solution and think it works much better than a traditional UPSERT.  I need to *remember* that we can use EXCEPT to identify the subset of rows to be inserted/updated.  A similar method can be used in an [NOT] EXISTS using INTERSECT or EXCEPT to identify the rows.

    SELECT ...
    FROM db1.tbl a
    JOIN db2.tbl b ON a.keycol = b.keycol
    WHERE NOT EXISTS (SELECT a.col1, ...
    INTERSECT
    SELECT b.col1, ...)

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply