Merge statement with partial part of the table.

  • Hi,

    I have a very big table and I need to do a merge.

    if the primary key and the record source match then I need to update one column, otherwise I need to insert.

    I really dont need to do the whole table, but only the part when the data is bigger than a specific day.

    I tried, but it is taking too long, because I think it scans the whole table.

    --DECLARE @LastSeenDate datetime = @LoadDate - 2  (date) 

    MERGE
    HubHistoryData AS Target --
    USING
    (
    SELECT
    DISTINCT
    HistoryDataHashKey --
    , LoadDate
    , RecordSource
    , LoadDate
    , [HistoryDataPrimaryKey]
    FROM
    [HISTORY_DATA] --
    WHERE
    LoadDate = @LoadDate
    ) AS Source
    (
    HistoryDataHashKey --
    , LoadDate
    , RecordSource
    , LastSeenDate
    , [HistoryDataPrimaryKey]
    )
    ON
    (
    Target.[HistoryDataPrimaryKey] = Source.[HistoryDataPrimaryKey] AND --
    Target.RecordSource = Source.RecordSource AND
    Target.LastSeenDate > @LastSeenDate -- testing date


    )
    WHEN MATCHED THEN
    UPDATE SET LastSeenDate = Source.LastSeenDate
    WHEN NOT MATCHED THEN
    INSERT (HistoryDataHashKey --
    , LoadDate
    , RecordSource
    , LastSeenDate
    , [HistoryDataPrimaryKey] )

    VALUES (Source.HistoryDataHashKey --
    ,Source.LoadDate
    ,Source.RecordSource
    ,Source.LastSeenDate
    ,Source.[HistoryDataPrimaryKey]) -

     

  • You might find performance improvements if you use this method instead.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • why would you need extra field in the join?  Shouldn't the primary key and Record Source be enough?

    Target.[HistoryDataPrimaryKey] = Source.[HistoryDataPrimaryKey] AND --

    Target.RecordSource = Source.RecordSource AND

    then

    when matched and Target.LastSeenDate > @LastSeenDate then update..

     

    What does the execution plan look like??  How often does this run?  How big are the blocks of data between runs?

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • PhiI am trying the method on the link, but I am not sure how to bring the where without scanning the whole table every time

    UPDATE HubHistoryData WITH (UPDLOCK, SERIALIZABLE) SET LastSeenDate = @LoadDate 
    WHERE = @key;

    Mike, I tried that one first, and it takes the same time which is almost 4 hours, that is what trying to avoid.

  • Rather than assuming what the optimiser is doing, please check the execution plans to be sure.

    What indexes and keys do you have on HubHistoryData?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • If possible I always revert to standard UPDATE and INSERT instead of using a MERGE, this will give better performance. This is also the recommendation from Microsoft: "Performance Tip: The conditional behavior described for the MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it doesn't exist, or updating a row if it matches. When simply updating one table based on the rows of another table, improve the performance and scalability with basic INSERT, UPDATE, and DELETE statements."

    https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15

    So something like this should perform better:

    UPDATE Target
    SET Target.LastSeenDate = Source.LastSeenDate
    FROM HubHistoryData Target
    INNER JOIN [HISTORY_DATA] Source
    ON Target.[HistoryDataPrimaryKey] = Source.[HistoryDataPrimaryKey]
    AND Target.RecordSource = Source.RecordSource
    AND Target.LastSeenDate < Source.LastSeenDate
    AND Source.LastSeenDate > @LastSeenDate;

    INSERT INTO HubHistoryData
    (
    HistoryDataHashKey,
    LoadDate,
    RecordSource,
    LastSeenDate,
    [HistoryDataPrimaryKey]
    )
    SELECT Source.HistoryDataHashKey,
    Source.LoadDate,
    Source.RecordSource,
    Source.LastSeenDate,
    Source.[HistoryDataPrimaryKey]
    FROM [HISTORY_DATA] Source
    WHERE NOT EXISTS (SELECT *
    FROM HubHistoryData Target
    WHERE Target.[HistoryDataPrimaryKey] = Source.[HistoryDataPrimaryKey]
    AND Target.RecordSource = Source.RecordSource
    AND Target.LastSeenDate = Source.LastSeenDate)
    AND Source.LastSeenDate > @LastSeenDate;

    I wasn't sure what you were doing with the comparison to @LastSeenDate. I think you need to check the Source.LastSeenDate row is greater than @LastSeenDate, so I've put that in the code instead of what you had.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply