Best method for Update then insert with single transaction

  • Guys-

    I have a table with 23 billion rows so Merge is out, as it scans the entire thing and has become a hog. So I'm going back in time to 2005 and before for this example.

    The question that I have is, as the statement loops over the update, and the @@rowcount is hit, how can I use those values it tried updating with the insert statement section without rewriting a secondary query? In other words, I am performing an update first and if that is not performed I want to then insert that same row. The below is what seems to be an accepted answer from before Merge. The temp table can have close to 23 million records and the primary keys are SDP_ID and Meter_Read_Dttm. Any help or pointers on how to do this efficiently are appreciated.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    BEGIN TRANSACTION;

    UPDATE Primary_Table SET SDP_EffBegDate=temp.SDP_EffBegDate,

    Meter_Read_Status_Id=temp.Meter_Read_Status_Id,

    Src_Del_Flag=temp.Src_Del_Flag,

    ETL_Batch_Id=temp.ETL_Batch_Id,

    Meter_Read=temp.Meter_Read

    FROM Temp_Table temp, Primary_Table AS Hourly

    WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2016-01-01 00:00:00';

    IF @@ROWCOUNT = 0

    BEGIN

    INSERT Meter_Read_KWH_Hourly (

    SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read) (Select clause or variables or?);

    END

    COMMIT TRANSACTION;

  • fergfamster (7/25/2016)


    Guys-

    I have a table with 23 billion rows so Merge is out, as it scans the entire thing and has become a hog. So I'm going back in time to 2005 and before for this example.

    The question that I have is, as the statement loops over the update, and the @@rowcount is hit, how can I use those values it tried updating with the insert statement section without rewriting a secondary query? In other words, I am performing an update first and if that is not performed I want to then insert that same row. The below is what seems to be an accepted answer from before Merge. The temp table can have close to 23 million records and the primary keys are SDP_ID and Meter_Read_Dttm. Any help or pointers on how to do this efficiently are appreciated.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    BEGIN TRANSACTION;

    UPDATE Primary_Table SET SDP_EffBegDate=temp.SDP_EffBegDate,

    Meter_Read_Status_Id=temp.Meter_Read_Status_Id,

    Src_Del_Flag=temp.Src_Del_Flag,

    ETL_Batch_Id=temp.ETL_Batch_Id,

    Meter_Read=temp.Meter_Read

    FROM Temp_Table temp, Primary_Table AS Hourly

    WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2016-01-01 00:00:00';

    IF @@ROWCOUNT = 0

    BEGIN

    INSERT Meter_Read_KWH_Hourly (

    SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read) (Select clause or variables or?);

    END

    COMMIT TRANSACTION;

    This is basically what I would do, but there appears to be some issues with your UPDATE.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    BEGIN TRANSACTION;

    UPDATE Primary_Table SET

    SDP_EffBegDate = temp.SDP_EffBegDate,

    Meter_Read_Status_Id = temp.Meter_Read_Status_Id,

    Src_Del_Flag = temp.Src_Del_Flag,

    ETL_Batch_Id = temp.ETL_Batch_Id,

    Meter_Read = temp.Meter_Read

    FROM

    Temp_Table temp

    INNER JOIN Primary_Table AS Hourly

    on (hourly.SDP_Id = temp.SDP_Id and

    Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and

    Hourly.Meter_Read_Dttm >= '2016-01-01 00:00:00');

    --IF @@ROWCOUNT = 0 -- What if there are values that need to be updated as well as inserted from Temp_Table?

    INSERT Meter_Read_KWH_Hourly (

    SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read)

    Select

    temp.SDP_Id,

    temp.SDP_EffBegDate,

    temp.Meter_Read_Status_Id,

    temp.ETL_Batch_Id,

    temp.Src_Del_Flag,

    temp.Meter_Read_Dttm,

    temp.Meter_Read

    FROM

    Temp_Table temp

    WHERE

    NOT EXISTS(SELECT 1 FROM Primary_Table Hourly

    WHERE

    hourly.SDP_Id = temp.SDP_Id and

    Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm);

    COMMIT TRANSACTION;

    Figured out my issue with your UPDATE statement, I couldn't see the equals. Please use white space, the computer may not care but it makes things easier to read by humans.

  • http://source.entelect.co.za/why-is-this-upsert-code-broken

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Lynn, Ill try harder to use whitespace the next time I post.... You dont think using the @@rowcount is useful? Better to just do both statements?

  • Thanks Gail, useful article!

  • fergfamster (7/25/2016)


    You dont think using the @@rowcount is useful? Better to just do both statements?

    No, because what if there's 6 rows that need inserting and 4 updating? To use rowcount, you have to loop over those 10 rows. With Lynn's way you just run both statements, the update affects the 4 existing rows and the insert affects the 6 rows that weren't in the table at the start.

    The exists needs lock hints though (as in my article), otherwise there's still a chance of duplicates deadlocks if there are multiple sessions running this concurrently.

    Edit: corrected last line

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Guys and Gals! Appreciate the help!

  • GilaMonster (7/25/2016)


    fergfamster (7/25/2016)


    You dont think using the @@rowcount is useful? Better to just do both statements?

    No, because what if there's 6 rows that need inserting and 4 updating? To use rowcount, you have to loop over those 10 rows. With Lynn's way you just run both statements, the update affects the 4 existing rows and the insert affects the 6 rows that weren't in the table at the start.

    The exists needs lock hints though (as in my article), otherwise there's still a chance of duplicates if there are multiple sessions running this concurrently.

    Even with this? Just want to be sure.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

  • Lynn Pettis (7/25/2016)


    GilaMonster (7/25/2016)


    fergfamster (7/25/2016)


    You dont think using the @@rowcount is useful? Better to just do both statements?

    No, because what if there's 6 rows that need inserting and 4 updating? To use rowcount, you have to loop over those 10 rows. With Lynn's way you just run both statements, the update affects the 4 existing rows and the insert affects the 6 rows that weren't in the table at the start.

    The exists needs lock hints though (as in my article), otherwise there's still a chance of duplicates if there are multiple sessions running this concurrently.

    Even with this? Just want to be sure.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    Sorry, not duplicates. Deadlocks.

    Multiple sessions take a shared range lock, multiple sessions go to convert the lock to X for the insert, can't be converted due to the shared range locks, instant deadlock.

    I'd recommend use the locking hints and not the elevated isolation level, and it only needs the transaction if it's there for rollbacks on errors (and then with a try catch block).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/25/2016)


    Lynn Pettis (7/25/2016)


    GilaMonster (7/25/2016)


    fergfamster (7/25/2016)


    You dont think using the @@rowcount is useful? Better to just do both statements?

    No, because what if there's 6 rows that need inserting and 4 updating? To use rowcount, you have to loop over those 10 rows. With Lynn's way you just run both statements, the update affects the 4 existing rows and the insert affects the 6 rows that weren't in the table at the start.

    The exists needs lock hints though (as in my article), otherwise there's still a chance of duplicates if there are multiple sessions running this concurrently.

    Even with this? Just want to be sure.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    Sorry, not duplicates. Deadlocks.

    Multiple sessions take a shared range lock, multiple sessions go to convert the lock to X for the insert, can't be converted due to the shared range locks, instant deadlock.

    I'd recommend use the locking hints and not the elevated isolation level, and it only needs the transaction if it's there for rollbacks on errors (and then with a try catch block).

    I have started code for deadlocks in some of the code I have written lately. Unfortunately, when it comes to some of the developers code that is embedded in the code, not much luck as I don't get to see it before it goes to production. If there are problems in production I usually can't see it there either as the production software runs on secured networks.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply