Merge Statement using CTE

  • Guys-

    I have the following Merge that i created to help speed things up with the min(date) parameter so the rows to scan are less. However, I'm still seeing a 20 minute or more time to complete. Any ideas on how I can enforce the date logic to only merge on the temp table rows and not scan the entire set in primary? We are talking about 12 to 20 million rows merged daily.

    WITH Mergetst AS (SELECT *, min(Temp.Meter_Read_Dttm) as MinDate

    FROM Meter_Read_KWH_Hourly_Temp AS Temp Group by SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read)

    MERGE INTO Meter_Read_KWH_Hourly AS Hourly

    USING Mergetst

    ON Hourly.SDP_ID = Mergetst.sdp_id

    and Hourly.Meter_Read_Dttm = Mergetst.Meter_Read_Dttm

    and Hourly.Meter_Read_Dttm >= Mergetst.MinDate

    WHEN Matched THEN

    UPDATE SET

    Hourly.SDP_EffBegDate=Mergetst.SDP_EffBegDate,

    Hourly.Meter_Read_Status_Id=Mergetst.Meter_Read_Status_Id,

    Hourly.Src_Del_Flag=Mergetst.Src_Del_Flag,

    Hourly.ETL_Batch_Id=Mergetst.ETL_Batch_Id,

    Hourly.Meter_Read=Mergetst.Meter_Read

    WHEN not matched THEN

    INSERT (

    SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read)

    VALUES(

    Mergetst.SDP_Id,

    Mergetst.SDP_EffBegDate,

    Mergetst.Meter_Read_Status_Id,

    Mergetst.ETL_Batch_Id,

    Mergetst.Src_Del_Flag,

    Mergetst.Meter_Read_Dttm,

    Mergetst.Meter_Read);

  • Read this for better help: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Any suggestion at this point, would be a wild guess.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.

  • fergfamster (7/21/2016)


    Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.

    15 files? Why? It's just DDL for tables and indexes and the execution plan.

    Do you realize that you're not aggregating anything? Meter_Read_Dttm, which is the column in the MIN() function, is included in the GROUP BY.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • fergfamster (7/21/2016)


    Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.

    Because I have very little experience with MERGE statements and MERGE operations in general, I submit this for your (and mine) edification...

    Please note I have only tried to recreate the DDL based on your grouping statement.

    CREATE TABLE Meter_Read_KWH_Hourly_Temp(SDP_ID INT, SDP_EffBegDate DATETIME, Meter_Read_Status_ID INT,ETL_Batch_ID INT,Src_Del_Flag BIT

    ,Meter_Read_Dttm DATETIME, Meter_Read INT)

    GO

    CREATE TABLE MEter_Read_KWH_Hourly(SDP_ID INT, SDP_EffBegDate DATETIME, Meter_Read_Status_ID INT,ETL_Batch_ID INT,Src_Del_Flag BIT

    ,Meter_Read_Dttm DATETIME, Meter_Read INT)

    GO

    I have also added a loop to populate some data.... it took our beefy machine 7 minutes to generate 1M rows of data you can cut it down if you wish but based on your statement of 12-20 million rows I figured 1M rows was a good test bed.

    Now what I did was run it with the datetime variable @TheDate=7/7/2016 01:00:00 and generated a 1M rows then I set the time to 7/7/2016 02:00:00 and ran it again.

    SET NOCOUNT ON;

    DECLARE @Count INT

    DECLARE @TheDate DATETIME

    DECLARE @SDP_EffBegDate DATETIME

    DECLARE @Meter_Read_Status_ID INT

    DECLARE @ETL_Batch_Id INT

    DECLARE @Src_Del_Flag BIT

    DECLARE @Meter_Read_Dttm DATETIME

    DECLARE @Meter_Read INT

    --for when you need to start over

    --DELETE Meter_Read_KWH_Hourly_Temp

    --DELETE MEter_Read_KWH_Hourly

    --Check our data load progress

    --SELECT COUNT(1) from Meter_Read_KWH_Hourly_Temp

    --SELECT COUNT(1) from Meter_Read_KWH_Hourly

    SET @Count=1 -- Tis obvious...

    SET @TheDate = '07/07/2016 02:00:00' --I am using a variable because I need to populate previous time data

    WHILE @Count < 1000000

    BEGIN

    --set our values to insert

    SET @SDP_EffBegDate=CAST(CAST(MONTH(@TheDate) as VARCHAR(3))+'/1/'+ CAST(YEAR(@TheDate)as VARCHAR(4)) as DATETIME) --This is an arb. date presum. before the meter reading date

    SET @Meter_Read_Status_ID = (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 10000 ) --change the last number to limit the Random number 1-9999 in this case

    SET @ETL_Batch_ID = @Count + 1000 -- again an arbitrary number based upon the batch being processed in this case @count+1000 will do

    SET @Src_Del_Flag = (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 2) --simple bit field lets randomize it!!!

    SET @Meter_Read_Dttm = GETDATE() --THis is the actual date/time of the meter reading

    SET @Meter_Read= (SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS int)) % 10000000 ) --The actual meter reading lets goto 10Mil!

    INSERT INTO Meter_Read_KWH_Hourly_Temp (SDP_Id,SDP_EffBegDate,Meter_Read_Status_Id,ETL_Batch_Id,Src_Del_Flag,Meter_Read_Dttm,Meter_Read)

    VALUES(@Count,@SDP_EffBegDate,@Meter_Read_Status_Id,@ETL_Batch_Id,@Src_Del_Flag,@Meter_Read_Dttm,@Meter_Read);

    SET @Count=@Count+1

    END

    ;

    Then I ran your MERGE script, this took our beefy dev box 14 seconds to run against 2M rows

    ;WITH Mergetst AS (SELECT *, min(Temp.Meter_Read_Dttm) as MinDate

    FROM Meter_Read_KWH_Hourly_Temp AS Temp Group by SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read)

    MERGE INTO Meter_Read_KWH_Hourly AS Hourly

    USING Mergetst

    ON Hourly.SDP_ID = Mergetst.sdp_id

    and Hourly.Meter_Read_Dttm = Mergetst.Meter_Read_Dttm

    and Hourly.Meter_Read_Dttm >= Mergetst.MinDate

    WHEN Matched THEN

    UPDATE SET

    Hourly.SDP_EffBegDate=Mergetst.SDP_EffBegDate,

    Hourly.Meter_Read_Status_Id=Mergetst.Meter_Read_Status_Id,

    Hourly.Src_Del_Flag=Mergetst.Src_Del_Flag,

    Hourly.ETL_Batch_Id=Mergetst.ETL_Batch_Id,

    Hourly.Meter_Read=Mergetst.Meter_Read

    WHEN not matched THEN

    INSERT (

    SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read)

    VALUES(

    Mergetst.SDP_Id,

    Mergetst.SDP_EffBegDate,

    Mergetst.Meter_Read_Status_Id,

    Mergetst.ETL_Batch_Id,

    Mergetst.Src_Del_Flag,

    Mergetst.Meter_Read_Dttm,

    Mergetst.Meter_Read);

    Execution Plan attached

  • Thank you for the effort and trials...! The Temp Table is 23 Million but the Primary table is 23 billion which i why i thought the date scan would make it faster. However, What I have found is that Merge will break all of this up into separate upsert statements and still scan the entire thing regardless. So what took your side 14 seconds is still taking mine 20-30 minutes even with enhanced resources. I am trying to now play with the following to compare the 2 approaches just to get a better overall testplan. If you have any other ideas that might be better would love to hear them. Merge and OUTPUT I also tried with little luck.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    BEGIN TRANSACTION;

    UPDATE Meter_Read_KWH_Hourly SET SDP_EffBegDate=temp.SDP_EffBegDate,

    Meter_Read_Status_Id=temp.Meter_Read_Status_Id,

    Src_Del_Flag=temp.Src_Del_Flag,

    ETL_Batch_Id=temp.ETL_Batch_Id,

    Meter_Read=temp.Meter_Read

    FROM Meter_Read_KWH_Hourly_Temp temp, Meter_Read_KWH_Hourly AS Hourly

    WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2015-05-28 07:00:00';

    IF @@ROWCOUNT = 0

    BEGIN

    INSERT Meter_Read_KWH_Hourly (

    SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read) values(temp.SDP_EffBegDate,);

    END

    COMMIT TRANSACTION;

  • fergfamster (7/21/2016)


    Thanks, I dont really want to post 15 files you guys need to review to get an answer. ill figure it out.

    seems you havent figured it out yet ..so suggest you post what was asked for.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • fergfamster (7/22/2016)


    Thank you for the effort and trials...! The Temp Table is 23 Million but the Primary table is 23 billion which i why i thought the date scan would make it faster. However, What I have found is that Merge will break all of this up into separate upsert statements and still scan the entire thing regardless. So what took your side 14 seconds is still taking mine 20-30 minutes even with enhanced resources. I am trying to now play with the following to compare the 2 approaches just to get a better overall testplan. If you have any other ideas that might be better would love to hear them. Merge and OUTPUT I also tried with little luck.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    BEGIN TRANSACTION;

    UPDATE Meter_Read_KWH_Hourly SET SDP_EffBegDate=temp.SDP_EffBegDate,

    Meter_Read_Status_Id=temp.Meter_Read_Status_Id,

    Src_Del_Flag=temp.Src_Del_Flag,

    ETL_Batch_Id=temp.ETL_Batch_Id,

    Meter_Read=temp.Meter_Read

    FROM Meter_Read_KWH_Hourly_Temp temp, Meter_Read_KWH_Hourly AS Hourly

    WHERE hourly.SDP_Id = temp.SDP_Id and Hourly.Meter_Read_Dttm = Temp.Meter_Read_Dttm and Hourly.Meter_Read_Dttm >= '2015-05-28 07:00:00';

    IF @@ROWCOUNT = 0

    BEGIN

    INSERT Meter_Read_KWH_Hourly (

    SDP_Id,

    SDP_EffBegDate,

    Meter_Read_Status_Id,

    ETL_Batch_Id,

    Src_Del_Flag,

    Meter_Read_Dttm,

    Meter_Read) values(temp.SDP_EffBegDate,);

    END

    COMMIT TRANSACTION;

    Perhaps you should look at partitioning your table(s) in question. If you really only need to look at the previous days data partitioning it out by day or even hour would allow you to break up that 23 billion row table for your merge operation....If I get the time I will try to come up with a construct based on what I already got but if you could post the DDL, it would be very helpful.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply