Is there a clever way to do an UPDATE/MERGE and insert rows when there are duplicate matches?

  • Hi - I'm trying to do something like below and wondering if there's a way I can do it with the MERGE function or some other clever type of update.

    I have two tables similar to below:

    TargetTable

    -----------

    Row Key SecurityID

    1 KKL43431

    2 KKLFK992

    3 KKL11011

    SourceTable

    -----------

    SecurityIDKey

    401KKL43431

    987KKL43431

    533KKLFK992

    123KKL11011

    I want to update the target table with SecurityID, joining on the key value. As you can see there is more than one possible match for a key in the source table. The end result I need is this:

    TargetTable

    -----------

    RowKeySecurityID

    1KKL43431401

    1KKL43431987

    2KKLFK992533

    3KKL11011123

    So, I would like to insert rows for the cases where there are duplicates. If I do UPDATE FROM, I will have the same number of rows, and SQL Server will pick one for me. I was playing with MERGE a bit but couldn't figure out how to achieve this behavior.

    The only thing I can think of doing is to join the two tables together and insert into a new table. Are there any other cleve ways to accomplish this?

    Thanks,

    H

    Edited to add: Apologies for the bad formatting. My tabs and spaces didn't show properly and I didn't have a chance to figure out how to get that working yet. I think it's pretty clear though.

  • This worked for me:

    declare @TargetTable table ([Key] char(8), SecurityID int)

    declare @SourceTable table ([Key] char(8), SecurityID int)

    insert into @TargetTable([Key]) values

    ('KKL43431'),

    ('KKLFK992'),

    ('KKL11011')

    insert into @SourceTable([Key], SecurityID) values

    ('KKL43431',401),

    ('KKL43431',987),

    ('KKLFK992',533),

    ('KKL11011',123)

    update t

    set t.SecurityID = s.SecurityID

    from @TargetTable t

    join @SourceTable s

    on t.[Key] = s.[Key]

    insert into @TargetTable ([Key], SecurityID)

    select [Key], SecurityID

    from @SourceTable

    except

    select [Key], SecurityID

    from @TargetTable

    select * from @TargetTable

    Results:

    Key SecurityID

    KKL43431 401

    KKLFK992533

    KKL11011123

    KKL43431987

    BTW, try not to use reserved words as column names. "Key" is such a word in SQL Server.

  • Thanks for the quick reply. So basically this is a two-step update, first an update then inserting the duplicates. I will experiment with that and see if it works for me. One thing I forgot to mention is that the target table may have 10 million+ records - and the source table is a few hundred thousand. So I would also need something that is somewhat performant. There is nothing waiting on this query (GUI, etc) but I would rather have something that takes 1 or 2 minutes vs. something that could take 10 min to complete.

    So I'm wondering if joining and inserting to a new table is going to be faster than doing an update and then an insert. I will have to experiment.

    P.S. The column and table names (and schema) is not my actual tables - I just simplified it for the example but should have chosen different column names to avoid confusion.

    Thanks,

    H

  • be sure your source and target tables are indexed on the key column

  • hominamad (7/3/2014)


    Thanks for the quick reply. So basically this is a two-step update, first an update then inserting the duplicates. I will experiment with that and see if it works for me. One thing I forgot to mention is that the target table may have 10 million+ records - and the source table is a few hundred thousand. So I would also need something that is somewhat performant. There is nothing waiting on this query (GUI, etc) but I would rather have something that takes 1 or 2 minutes vs. something that could take 10 min to complete.

    So I'm wondering if joining and inserting to a new table is going to be faster than doing an update and then an insert. I will have to experiment.

    P.S. The column and table names (and schema) is not my actual tables - I just simplified it for the example but should have chosen different column names to avoid confusion.

    Thanks,

    H

    Here's the MERGE you asked for - and I'd guess that gbritton's solution would be several times faster.

    ;

    WITH TargetTable AS (SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [Key] ORDER BY SecurityID) FROM #TargetTable)

    MERGE TargetTable AS target

    USING (

    SELECT t.[row], t.[Key], s.SecurityID,

    rn = ROW_NUMBER() OVER(PARTITION BY t.[Key] ORDER BY s.SecurityID)

    FROM #TargetTable t

    INNER JOIN #SourceTable s

    ON s.[Key] = t.[Key]

    ) AS source

    ON target.[Key] = source.[Key] AND target.rn = source.rn

    WHEN MATCHED THEN

    UPDATE SET SecurityID = source.SecurityID

    WHEN NOT MATCHED THEN

    INSERT ([row], SecurityID, [Key])

    VALUES (source.[row], source.SecurityID, source.[Key]);

    SELECT * FROM #TargetTable ORDER BY [row]

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM - that MERGE is brilliant. I will try out all these solutions and probably go with whatever is fastest. MERGE statement seems very powerful and flexible as it let's you do things like this even though it's probably not exactly the type of thing it was designed for.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply