Merge with duplicate rows

  • Hello all, I am running merge stored procedures against various staging and final tables in my database. I have one table that is giving me issues. When running the following merge statement I receive the common merge error "The MERGE statement attempted to UPDATE or DELETE the same row more than once..."

    Here is my code:

    /****** Object: StoredProcedure [dbo].[sp_Merge_AssetClassUDF] Script Date: 11/06/2012 16:15:48 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER PROCEDURE [dbo].[sp_Merge_AssetClassUDF]

    @PullDate Date

    AS BEGIN

    MERGE [AssetClassUDF] AS target

    USING [RawData_AssetClassUDF] AS source

    ON target.ASSETDETAIL_Id = source.ASSETDETAIL_Id

    AND target.ASSET_UDF_DESC = source.ASSET_UDF_DESC

    WHEN MATCHED THEN

    UPDATE SET

    [ASSET_UDF_VALUE] = source.[ASSET_UDF_VALUE],

    [ASSET_UDF_DESC] = source.[ASSET_UDF_DESC],

    [ASSETDETAIL_Id] = source.[ASSETDETAIL_Id],

    Pulldate = @PullDate

    WHEN NOT MATCHED THEN

    INSERT (

    [ASSET_UDF_VALUE],

    [ASSET_UDF_DESC],

    [ASSETDETAIL_Id],

    Pulldate)

    VALUES (

    source.[ASSET_UDF_VALUE],

    source.[ASSET_UDF_DESC],

    source.[ASSETDETAIL_Id],

    @PullDate);

    END

    ASSET_UDF_VALUE = user defined and many blank entries. Some duplicates.

    ASSET_UDF_DESC = user defined and many duplicate values.

    ASSETDETAIL_Id = provided by front end application and many duplicate values.

    PullDate = Filled in by Stored procedure passing a getdate value

    I think the above code breaks because the values in ASSET_UDF_DESC and ASSETDETAIL_Id are not unique. (I have lots more of these that work, but I have unique values to use and a source column in my merge statement)

    I've tried lots of combinations of changing the target columns, but no luck!

    Next is an output of a query of the data to give you an idea of what the table looks like. (any pointers how to copy and paste results from a query into here so it keeps its formatting?)

    ASSET_UDF_VALUEASSET_UDF_DESCASSETDETAIL_IdPulldate

    Filter 1 - Size/Description 16 2012-10-31

    Filter 1 - Qty/Total 162012-10-31

    Filter 2 - Size/Description 162012-10-31

    Filter 2 - Qty/Total162012-10-31

    Belt Size/Description162012-10-31

    Refrigerant Type162012-10-31

    2/20X20X2Filter 1 Size/Description162012-10-31

    Filter 2 Size/Description162012-10-31

    Refrigerant Capacity162012-10-31

    Nominal Tons162012-10-31

    Filter 1 - Size/Description322012-10-31

    Filter 1 - Qty/Total 322012-10-31

    Filter 2 - Size/Description 322012-10-31

    Filter 2 - Qty/Total322012-10-31

    Belt Size/Description322012-10-31

    Refrigerant Type322012-10-31

    2/20X20X2Filter 1 Size/Description322012-10-31

    Filter 2 Size/Description322012-10-31

    1/A32Belt Size/Description322012-10-31

    Refrigerant Capacity322012-10-31

    Nominal Tons322012-10-31

    Filter 1 - Size/Description482012-10-31

    Filter 1 - Qty/Total 482012-10-31

    Filter 2 - Size/Description 482012-10-31

    Filter 2 - Qty/Total482012-10-31

    Belt Size/Description482012-10-31

    Refrigerant Type482012-10-31

    Refrigerant Capacity482012-10-31

    Filter 1 Size/Description482012-10-31

    Filter 2 Size/Description482012-10-31

    Its hard to see but for asset-id 16 there are 10 or so duplicates. Each asset has the asset_udf_desc entry. Some of the Asset_UDF_Values have entries. As changes are made in the application, Values will be added as well as IDs.

    Please let me know if there is anything you can recommend.

  • SQL-Squid,

    You're correct in what you said the issue is that your source has duplicates, so based on the below you will always return multiple results, hence the MERGE statement returns that error.

    MERGE [AssetClassUDF] AS target

    USING [RawData_AssetClassUDF] AS source

    ON target.ASSETDETAIL_Id = source.ASSETDETAIL_Id

    AND target.ASSET_UDF_DESC = source.ASSET_UDF_DESC

    Unfortunatley with the MERGE as the error says you cant update/insert the same record twice... so you only have 2 options.

    1. use a GROUP BY in your MERGE so that way it will treat duplicates as One value & this should fix your problem (If this is what you want)

    or

    2. Clean the SOURCE so it doesn't contain duplicates.

    I personally don't like Option 2 - Cleaning the source file as this is how it gets delivered and always good reference point when you want to compare RAW -> STAGING -> LIVE.

  • Tava, you can insert the same record twice with a merge statement, heres a test rig.

    Create Table #test_merge_target (

    Col1 varchar(10)

    ,Col2 varchar(10)

    )

    Create Table #test_merge_source (

    Col1 varchar(10)

    ,Col2 varchar(10)

    )

    Insert into #test_merge_source

    values ('Acol','test data')

    ,('Acol','test data')

    ,('bcol','test data')

    Merge into

    #test_merge_target as target

    USING #test_merge_source as Source

    ON target.Col1=Source.Col1

    WHEN MATCHED THEN

    Update Set target.Col2=Source.Col2

    When NOT MATCHED Then

    INSERT (Col1,Col2)

    VALUES (Source.Col1,Source.Col2);

    Select * from #test_merge_target

    GO

    drop table #test_merge_source

    drop table #test_merge_target

    however you cannot UPDATE or DELETE the Same row twice.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jason-299789 (11/7/2012)


    Tava, you can insert the same record twice with a merge statement, heres a test rig.

    Create Table #test_merge_target (

    Col1 varchar(10)

    ,Col2 varchar(10)

    )

    Create Table #test_merge_source (

    Col1 varchar(10)

    ,Col2 varchar(10)

    )

    Insert into #test_merge_source

    values ('Acol','test data')

    ,('Acol','test data')

    ,('bcol','test data')

    Merge into

    #test_merge_target as target

    USING #test_merge_source as Source

    ON target.Col1=Source.Col1

    WHEN MATCHED THEN

    Update Set target.Col2=Source.Col2

    When NOT MATCHED Then

    INSERT (Col1,Col2)

    VALUES (Source.Col1,Source.Col2);

    Select * from #test_merge_target

    GO

    drop table #test_merge_source

    drop table #test_merge_target

    however you cannot UPDATE or DELETE the Same row twice.

    one way to aviod this is to use unique condition for the rowset;

    i.e if unique column exist then add that in the NOT MATCHED condition ...

    ~ demonfox
    ___________________________________________________________________
    Wondering what I would do next , when I am done with this one :ermm:

  • Agreed demon, a unique constraing on the target table is the only way to prevent this happening.

    I've had a look at the data the op put up and I cant see any duplicates on in the set based on the DESC & ID columns.

    One thing the OP could do is a Distinct/group by on all columns in a CTE and use that as the Source rather than the raw table.

    Personally, I'd throw this back up to whoever generated the file and ask them to explain wht there are duplicate rows, and remove them if possible, or look at altering the extract to make the rows more unique.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • -- Have a look at the dupesets in the source table - do you want them, or not?

    -- Then decide what to do. No point in finding a clever way to

    -- insert the dupes if you don't want them.

    SELECT *

    FROM (

    SELECT

    *,

    Dupecount = COUNT(*) OVER(PARTITION BY ASSETDETAIL_Id, ASSET_UDF_DESC)

    FROM [RawData_AssetClassUDF]

    ) d

    WHERE Dupecount > 1

    ORDER BY ASSETDETAIL_Id, ASSET_UDF_DESC

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks all. I will have a go with group by in my merge and see if I can get the results I am after.

    The source data is a feed that I cannot control and the duplicates are in fact needed by design. (they wont make modifications to the source application for my needs)

    Removing duplicates based on ID will eliminate rows I need. Inserts work as others have posted, its the update that gets me. As said, I will see if I can figure out the group by clause and post back my results.

    Thanks again everyone!

  • Here is an attempt to use a CTE and I am lost again, not sure if this is going to work as I expected. Code below throws the same merge error as before.

    I changed the group by columns and the on "target" columns around a few times and still I am not getting this to work...

    Maybe its something simple I am missing?

    ALTER PROCEDURE [dbo].[sp_Merge_AssetClassUDF_V2]

    @PullDate Date

    AS BEGIN

    MERGE [AssetClassUDF] AS target

    USING (

    SELECT ASSET_UDF_VALUE, ASSET_UDF_DESC, ASSETDETAIL_Id

    FROM AssetClassUDF

    GROUP BY ASSET_UDF_DESC,ASSET_UDF_VALUE)

    AS AssetClassUDF_CTE (ASSET_UDF_VALUE, ASSET_UDF_DESC, ASSETDETAIL_Id)

    ON

    (target.ASSETDETAIL_Id = AssetClassUDF_CTE.ASSETDETAIL_Id

    AND target.ASSET_UDF_DESC = AssetClassUDF_CTE.ASSET_UDF_DESC)

    WHEN MATCHED THEN

    UPDATE SET

    [ASSET_UDF_VALUE] = AssetClassUDF_CTE.[ASSET_UDF_VALUE],

    [ASSET_UDF_DESC] = AssetClassUDF_CTE.[ASSET_UDF_DESC],

    [ASSETDETAIL_Id] = AssetClassUDF_CTE.[ASSETDETAIL_Id],

    Pulldate = @PullDate

    WHEN NOT MATCHED THEN

    INSERT (

    [ASSET_UDF_VALUE],

    [ASSET_UDF_DESC],

    [ASSETDETAIL_Id],

    Pulldate)

    VALUES (

    AssetClassUDF_CTE.[ASSET_UDF_VALUE],

    AssetClassUDF_CTE.[ASSET_UDF_DESC],

    AssetClassUDF_CTE.[ASSETDETAIL_Id],

    @PullDate);

    END

    Would I be better to run separate update and inserts and use a CTE to delete duplicates? Basically avoiding Merge's logic altogether?

    At the end of the day I need to update the asset_udf_value when it changes for an existing asset_id (update) and then insert new rows when new data is added from the feed. (simple insert)

  • Here is the data to test with:

    CREATE TABLE [dbo].[AssetClassUDF](

    [ASSET_UDF_VALUE] [nvarchar](255) NULL,

    [ASSET_UDF_DESC] [nvarchar](255) NULL,

    [ASSETDETAIL_Id] [numeric](20, 0) NULL,

    [Pulldate] [date] NULL

    ) ON [PRIMARY]

    GO

    INSERT INTO ASSETCLASSUDF

    (ASSET_UDF_VALUE,ASSET_UDF_DESC,ASSETDETAIL_Id,Pulldate)

    SELECT '','Filter 1 - Size/Description','16','2012-10-31' Union All

    SELECT '','Filter 1 - Qty/Total ','16','2012-10-31' Union All

    SELECT '','Filter 2 - Size/Description ','16','2012-10-31' Union All

    SELECT '','Filter 2 - Qty/Total','16','2012-10-31' Union All

    SELECT '','Belt Size/Description','16','2012-10-31' Union All

    SELECT '','Refrigerant Type','16','2012-10-31' Union All

    SELECT '','Refrigerant Capacity','16','2012-10-31' Union All

    SELECT '2/20X20X2','Filter 1 Size/Description','16','2012-10-31' Union All

    SELECT '','Filter 2 Size/Description','16','2012-10-31' Union All

    SELECT '','Belt Size/Description','16','2012-10-31' Union All

    SELECT '','Refrigerant Type','16','2012-10-31' Union All

    SELECT '','Refrigerant Capacity','16','2012-10-31' Union All

    SELECT '','Nominal Tons','16','2012-10-31' Union All

    SELECT '','Filter 1 - Size/Description','32','2012-10-31' Union All

    SELECT '','Filter 1 - Qty/Total ','32','2012-10-31' Union All

    SELECT '','Filter 2 - Size/Description ','32','2012-10-31' Union All

    SELECT '','Filter 2 - Qty/Total','32','2012-10-31' Union All

    SELECT '','Belt Size/Description','32','2012-10-31' Union All

    SELECT '','Refrigerant Type','32','2012-10-31' Union All

    SELECT '','Refrigerant Capacity','32','2012-10-31' Union All

    SELECT '2/20X20X2','Filter 1 Size/Description','32','2012-10-31' Union All

    SELECT '','Filter 2 Size/Description','32','2012-10-31' Union All

    SELECT '1/A32','Belt Size/Description','32','2012-10-31' Union All

    SELECT '','Refrigerant Type','32','2012-10-31' Union All

    SELECT '','Refrigerant Capacity','32','2012-10-31' Union All

    SELECT '','Nominal Tons','32','2012-10-31' Union All

    SELECT '','Filter 1 - Size/Description','48','2012-10-31' Union All

    SELECT '','Filter 1 - Qty/Total ','48','2012-10-31' Union All

    SELECT '','Filter 2 - Size/Description ','48','2012-10-31' Union All

    SELECT '','Filter 2 - Qty/Total','48','2012-10-31' Union All

    SELECT '','Belt Size/Description','48','2012-10-31' Union All

    SELECT '','Refrigerant Type','48','2012-10-31' Union All

    SELECT '','Refrigerant Capacity','48','2012-10-31' Union All

    SELECT '','Filter 1 Size/Description','48','2012-10-31' Union All

    SELECT '','Filter 2 Size/Description','48','2012-10-31' Union All

    SELECT '','Belt Size/Description','48','2012-10-31' Union All

    SELECT '','Refrigerant Type','48','2012-10-31' Union All

    SELECT '','Refrigerant Capacity','48','2012-10-31' Union All

    SELECT '','Nominal Tons','48','2012-10-31' Union All

    SELECT '','Filter 1 - Size/Description','64','2012-10-31'

    Thanks to the Forum Etiquette guide here: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Jason-299789 (11/7/2012)


    Tava, you can insert the same record twice with a merge statement, however you cannot UPDATE or DELETE the Same row twice.

    Yes Sorry, I meant Update/Delete not Update/Insert...

  • Tava (11/6/2012)


    SQL-Squid,

    You're correct in what you said the issue is that your source has duplicates, so based on the below you will always return multiple results, hence the MERGE statement returns that error.

    MERGE [AssetClassUDF] AS target

    USING [RawData_AssetClassUDF] AS source

    ON target.ASSETDETAIL_Id = source.ASSETDETAIL_Id

    AND target.ASSET_UDF_DESC = source.ASSET_UDF_DESC

    Unfortunatley with the MERGE as the error says you cant update/insert the same record twice... so you only have 2 options.

    1. use a GROUP BY in your MERGE so that way it will treat duplicates as One value & this should fix your problem (If this is what you want)

    or

    2. Clean the SOURCE so it doesn't contain duplicates.

    I personally don't like Option 2 - Cleaning the source file as this is how it gets delivered and always good reference point when you want to compare RAW -> STAGING -> LIVE.

    Tava,

    Please give an example (if you have time) of how to use the group by in the merge. I am been trying to figure this out and I am still lost.

    Right now, I am running an update then insert and a cte to remove dups. Its messy and I don't feel its the best way to handle this. I also have a trigger on delete that archives these updates to an archive table to have a full history. With the update & insert routine, I am now having to run the delete dups cte against the archive table. I'm not getting the warm fuzzy feeling this is a correct way to implement all this.

    I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.

  • SQL-Squid (11/7/2012)


    Tava (11/6/2012)


    SQL-Squid,

    You're correct in what you said the issue is that your source has duplicates, so based on the below you will always return multiple results, hence the MERGE statement returns that error.

    MERGE [AssetClassUDF] AS target

    USING [RawData_AssetClassUDF] AS source

    ON target.ASSETDETAIL_Id = source.ASSETDETAIL_Id

    AND target.ASSET_UDF_DESC = source.ASSET_UDF_DESC

    Unfortunatley with the MERGE as the error says you cant update/insert the same record twice... so you only have 2 options.

    1. use a GROUP BY in your MERGE so that way it will treat duplicates as One value & this should fix your problem (If this is what you want)

    or

    2. Clean the SOURCE so it doesn't contain duplicates.

    I personally don't like Option 2 - Cleaning the source file as this is how it gets delivered and always good reference point when you want to compare RAW -> STAGING -> LIVE.

    Tava,

    Please give an example (if you have time) of how to use the group by in the merge. I am been trying to figure this out and I am still lost.

    Right now, I am running an update then insert and a cte to remove dups. Its messy and I don't feel its the best way to handle this. I also have a trigger on delete that archives these updates to an archive table to have a full history. With the update & insert routine, I am now having to run the delete dups cte against the archive table. I'm not getting the warm fuzzy feeling this is a correct way to implement all this.

    I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.

    I've tried using your code provided and everything you have done is correct... the GROUP BY is right its just the MERGE condition that is not exactly fitting your requirements - if you add that additional check then it will insert a new record as it won't match... The source file is a mess unfortunately and if you could get the primary key that would resolve it - in saying that seperate update/delete statements appear the way to go if its not possible.... you're best to wait for a more experienced developer as i've only started used MERGE myself - the source was terrible but i got them to fix that up with PK for me.

    MERGE [AssetClassUDF] AS target

    USING

    (

    SELECT

    ASSET_UDF_VALUE,

    ASSET_UDF_DESC,

    ASSETDETAIL_ID

    FROM

    [RawData_AssetClassUDF]

    GROUP BY

    ASSET_UDF_VALUE,

    ASSET_UDF_DESC,

    ASSETDETAIL_ID

    )AS source (ASSET_UDF_VALUE,ASSET_UDF_DESC,ASSETDETAIL_ID)

    ON

    target.ASSETDETAIL_Id = source.ASSETDETAIL_Id

    AND

    target.ASSET_UDF_DESC = source.ASSET_UDF_DESC

    -- NEEDS THIS CONDITION BUT ITS NOT WHAT YOU REQUIRE OVERALL

    --AND

    -- target.ASSET_UDF_VALUE = source.ASSET_UDF_VALUE

  • SQL-Squid (11/7/2012)


    Tava (11/6/2012)


    SQL-Squid,

    I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.

    To me it's beneficial to fix the main problem and thats the source file... If it's required for you to do what you need just have to ask and explain its importance otherwise there will be these issues ....

    adding a PK on their extract will be simple for them unless for some reason its not allowed.

    obviously if the source file can't be fixed - then MERGE might not be suited for you (my opinion) & this method you currently have is only way to go.

  • Tava (11/7/2012)


    SQL-Squid (11/7/2012)


    Tava (11/6/2012)


    SQL-Squid,

    I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.

    To me it's beneficial to fix the main problem and thats the source file... If it's required for you to do what you need just have to ask and explain its importance otherwise there will be these issues ....

    adding a PK on their extract will be simple for them unless for some reason its not allowed.

    obviously if the source file can't be fixed - then MERGE might not be suited for you (my opinion) & this method you currently have is only way to go.

    Uniqueness is required for the merge Updat/Delete ro work;

    you better off with separate update ..

    Or, if the duplicate row count is pretty less ; then you can filter those rows out of the merge statement and then update later only for these...or clean it before the update in the target; on the basis of design and requirement.

    ~ demonfox
    ___________________________________________________________________
    Wondering what I would do next , when I am done with this one :ermm:

  • demonfox (11/7/2012)


    Tava (11/7/2012)


    SQL-Squid (11/7/2012)


    Tava (11/6/2012)


    SQL-Squid,

    I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.

    To me it's beneficial to fix the main problem and thats the source file... If it's required for you to do what you need just have to ask and explain its importance otherwise there will be these issues ....

    adding a PK on their extract will be simple for them unless for some reason its not allowed.

    obviously if the source file can't be fixed - then MERGE might not be suited for you (my opinion) & this method you currently have is only way to go.

    Uniqueness is required for the merge Updat/Delete ro work;

    you better off with separate update ..

    Or, if the duplicate row count is pretty less ; then you can filter those rows out of the merge statement and then update later only for these...or clean it before the update in the target; on the basis of design and requirement.

    I agree, trying to make a MERGE work where its purpose isnt designed for your requirements... stick to the seperate insert/update/delete.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply