Merge with duplicate rows

Question

Merge with duplicate rows

SQL-Squid

Default port

Points: 1472
More actions
November 6, 2012 at 5:50 pm

#266596

Hello all, I am running merge stored procedures against various staging and final tables in my database. I have one table that is giving me issues. When running the following merge statement I receive the common merge error "The MERGE statement attempted to UPDATE or DELETE the same row more than once..."
Here is my code:
/****** Object: StoredProcedure [dbo].[sp_Merge_AssetClassUDF] Script Date: 11/06/2012 16:15:48 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[sp_Merge_AssetClassUDF]
@PullDate Date
AS BEGIN
MERGE [AssetClassUDF] AS target
USING [RawData_AssetClassUDF] AS source
ON target.ASSETDETAIL_Id = source.ASSETDETAIL_Id
AND target.ASSET_UDF_DESC = source.ASSET_UDF_DESC
WHEN MATCHED THEN
UPDATE SET
[ASSET_UDF_VALUE] = source.[ASSET_UDF_VALUE],
[ASSET_UDF_DESC] = source.[ASSET_UDF_DESC],
[ASSETDETAIL_Id] = source.[ASSETDETAIL_Id],
Pulldate = @PullDate
WHEN NOT MATCHED THEN
INSERT (
[ASSET_UDF_VALUE],
[ASSET_UDF_DESC],
[ASSETDETAIL_Id],
Pulldate)
VALUES (
source.[ASSET_UDF_VALUE],
source.[ASSET_UDF_DESC],
source.[ASSETDETAIL_Id],
@PullDate);
END
ASSET_UDF_VALUE = user defined and many blank entries. Some duplicates.
ASSET_UDF_DESC = user defined and many duplicate values.
ASSETDETAIL_Id = provided by front end application and many duplicate values.
PullDate = Filled in by Stored procedure passing a getdate value
I think the above code breaks because the values in ASSET_UDF_DESC and ASSETDETAIL_Id are not unique. (I have lots more of these that work, but I have unique values to use and a source column in my merge statement)
I've tried lots of combinations of changing the target columns, but no luck!
Next is an output of a query of the data to give you an idea of what the table looks like. (any pointers how to copy and paste results from a query into here so it keeps its formatting?)
ASSET_UDF_VALUEASSET_UDF_DESCASSETDETAIL_IdPulldate
Filter 1 - Size/Description 16 2012-10-31
Filter 1 - Qty/Total 162012-10-31
Filter 2 - Size/Description 162012-10-31
Filter 2 - Qty/Total162012-10-31
Belt Size/Description162012-10-31
Refrigerant Type162012-10-31
2/20X20X2Filter 1 Size/Description162012-10-31
Filter 2 Size/Description162012-10-31
Refrigerant Capacity162012-10-31
Nominal Tons162012-10-31
Filter 1 - Size/Description322012-10-31
Filter 1 - Qty/Total 322012-10-31
Filter 2 - Size/Description 322012-10-31
Filter 2 - Qty/Total322012-10-31
Belt Size/Description322012-10-31
Refrigerant Type322012-10-31
2/20X20X2Filter 1 Size/Description322012-10-31
Filter 2 Size/Description322012-10-31
1/A32Belt Size/Description322012-10-31
Refrigerant Capacity322012-10-31
Nominal Tons322012-10-31
Filter 1 - Size/Description482012-10-31
Filter 1 - Qty/Total 482012-10-31
Filter 2 - Size/Description 482012-10-31
Filter 2 - Qty/Total482012-10-31
Belt Size/Description482012-10-31
Refrigerant Type482012-10-31
Refrigerant Capacity482012-10-31
Filter 1 Size/Description482012-10-31
Filter 2 Size/Description482012-10-31
Its hard to see but for asset-id 16 there are 10 or so duplicates. Each asset has the asset_udf_desc entry. Some of the Asset_UDF_Values have entries. As changes are made in the application, Values will be added as well as IDs.
Please let me know if there is anything you can recommend.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Tava SSCarpal Tunnel Points: 4114 More actions · Answer 1

SQL-Squid,

You're correct in what you said the issue is that your source has duplicates, so based on the below you will always return multiple results, hence the MERGE statement returns that error.

MERGE [AssetClassUDF] AS target

USING [RawData_AssetClassUDF] AS source

ON target.ASSETDETAIL_Id = source.ASSETDETAIL_Id

AND target.ASSET_UDF_DESC = source.ASSET_UDF_DESC

Unfortunatley with the MERGE as the error says you cant update/insert the same record twice... so you only have 2 options.

1. use a GROUP BY in your MERGE so that way it will treat duplicates as One value & this should fix your problem (If this is what you want)

or

2. Clean the SOURCE so it doesn't contain duplicates.

I personally don't like Option 2 - Cleaning the source file as this is how it gets delivered and always good reference point when you want to compare RAW -> STAGING -> LIVE.

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 2

Tava, you can insert the same record twice with a merge statement, heres a test rig.

Create Table #test_merge_target (

Col1 varchar(10)

,Col2 varchar(10)

)

Create Table #test_merge_source (

Col1 varchar(10)

,Col2 varchar(10)

)

Insert into #test_merge_source

values ('Acol','test data')

,('Acol','test data')

,('bcol','test data')

Merge into

#test_merge_target as target

USING #test_merge_source as Source

ON target.Col1=Source.Col1

WHEN MATCHED THEN

Update Set target.Col2=Source.Col2

When NOT MATCHED Then

INSERT (Col1,Col2)

VALUES (Source.Col1,Source.Col2);

Select * from #test_merge_target

GO

drop table #test_merge_source

drop table #test_merge_target

however you cannot UPDATE or DELETE the Same row twice.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

demonfox SSCertifiable Points: 6289 More actions · Answer 3

Jason-299789 (11/7/2012)
Tava, you can insert the same record twice with a merge statement, heres a test rig.
Create Table #test_merge_target (
Col1 varchar(10)
,Col2 varchar(10)
)
Create Table #test_merge_source (
Col1 varchar(10)
,Col2 varchar(10)
)
Insert into #test_merge_source
values ('Acol','test data')
,('Acol','test data')
,('bcol','test data')
Merge into
#test_merge_target as target
USING #test_merge_source as Source
ON target.Col1=Source.Col1
WHEN MATCHED THEN
Update Set target.Col2=Source.Col2
When NOT MATCHED Then
INSERT (Col1,Col2)
VALUES (Source.Col1,Source.Col2);
Select * from #test_merge_target
GO
drop table #test_merge_source
drop table #test_merge_target
however you cannot UPDATE or DELETE the Same row twice.

one way to aviod this is to use unique condition for the rowset;

i.e if unique column exist then add that in the NOT MATCHED condition ...

~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one :ermm:

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 4

Agreed demon, a unique constraing on the target table is the only way to prevent this happening.

I've had a look at the data the op put up and I cant see any duplicates on in the set based on the DESC & ID columns.

One thing the OP could do is a Distinct/group by on all columns in a CTE and use that as the Source rather than the raw table.

Personally, I'd throw this back up to whoever generated the file and ask them to explain wht there are duplicate rows, and remove them if possible, or look at altering the extract to make the rows more unique.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

-- Have a look at the dupesets in the source table - do you want them, or not?

-- Then decide what to do. No point in finding a clever way to

-- insert the dupes if you don't want them.

SELECT *

FROM (

SELECT

*,

Dupecount = COUNT(*) OVER(PARTITION BY ASSETDETAIL_Id, ASSET_UDF_DESC)

FROM [RawData_AssetClassUDF]

) d

WHERE Dupecount > 1

ORDER BY ASSETDETAIL_Id, ASSET_UDF_DESC

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

SQL-Squid Default port Points: 1472 More actions · Answer 6

Thanks all. I will have a go with group by in my merge and see if I can get the results I am after.

The source data is a feed that I cannot control and the duplicates are in fact needed by design. (they wont make modifications to the source application for my needs)

Removing duplicates based on ID will eliminate rows I need. Inserts work as others have posted, its the update that gets me. As said, I will see if I can figure out the group by clause and post back my results.

Thanks again everyone!

SQL-Squid Default port Points: 1472 More actions · Answer 7

Here is an attempt to use a CTE and I am lost again, not sure if this is going to work as I expected. Code below throws the same merge error as before.

I changed the group by columns and the on "target" columns around a few times and still I am not getting this to work...

Maybe its something simple I am missing?

ALTER PROCEDURE [dbo].[sp_Merge_AssetClassUDF_V2]

@PullDate Date

AS BEGIN

MERGE [AssetClassUDF] AS target

USING (

SELECT ASSET_UDF_VALUE, ASSET_UDF_DESC, ASSETDETAIL_Id

FROM AssetClassUDF

GROUP BY ASSET_UDF_DESC,ASSET_UDF_VALUE)

AS AssetClassUDF_CTE (ASSET_UDF_VALUE, ASSET_UDF_DESC, ASSETDETAIL_Id)

ON

(target.ASSETDETAIL_Id = AssetClassUDF_CTE.ASSETDETAIL_Id

AND target.ASSET_UDF_DESC = AssetClassUDF_CTE.ASSET_UDF_DESC)

WHEN MATCHED THEN

UPDATE SET

[ASSET_UDF_VALUE] = AssetClassUDF_CTE.[ASSET_UDF_VALUE],

[ASSET_UDF_DESC] = AssetClassUDF_CTE.[ASSET_UDF_DESC],

[ASSETDETAIL_Id] = AssetClassUDF_CTE.[ASSETDETAIL_Id],

Pulldate = @PullDate

WHEN NOT MATCHED THEN

INSERT (

[ASSET_UDF_VALUE],

[ASSET_UDF_DESC],

[ASSETDETAIL_Id],

Pulldate)

VALUES (

AssetClassUDF_CTE.[ASSET_UDF_VALUE],

AssetClassUDF_CTE.[ASSET_UDF_DESC],

AssetClassUDF_CTE.[ASSETDETAIL_Id],

@PullDate);

END

Would I be better to run separate update and inserts and use a CTE to delete duplicates? Basically avoiding Merge's logic altogether?

At the end of the day I need to update the asset_udf_value when it changes for an existing asset_id (update) and then insert new rows when new data is added from the feed. (simple insert)

SQL-Squid Default port Points: 1472 More actions · Answer 8

Here is the data to test with:

CREATE TABLE [dbo].[AssetClassUDF](

[ASSET_UDF_VALUE] [nvarchar](255) NULL,

[ASSET_UDF_DESC] [nvarchar](255) NULL,

[ASSETDETAIL_Id] [numeric](20, 0) NULL,

[Pulldate] [date] NULL

) ON [PRIMARY]

GO

INSERT INTO ASSETCLASSUDF

(ASSET_UDF_VALUE,ASSET_UDF_DESC,ASSETDETAIL_Id,Pulldate)

SELECT '','Filter 1 - Size/Description','16','2012-10-31' Union All

SELECT '','Filter 1 - Qty/Total ','16','2012-10-31' Union All

SELECT '','Filter 2 - Size/Description ','16','2012-10-31' Union All

SELECT '','Filter 2 - Qty/Total','16','2012-10-31' Union All

SELECT '','Belt Size/Description','16','2012-10-31' Union All

SELECT '','Refrigerant Type','16','2012-10-31' Union All

SELECT '','Refrigerant Capacity','16','2012-10-31' Union All

SELECT '2/20X20X2','Filter 1 Size/Description','16','2012-10-31' Union All

SELECT '','Filter 2 Size/Description','16','2012-10-31' Union All

SELECT '','Belt Size/Description','16','2012-10-31' Union All

SELECT '','Refrigerant Type','16','2012-10-31' Union All

SELECT '','Refrigerant Capacity','16','2012-10-31' Union All

SELECT '','Nominal Tons','16','2012-10-31' Union All

SELECT '','Filter 1 - Size/Description','32','2012-10-31' Union All

SELECT '','Filter 1 - Qty/Total ','32','2012-10-31' Union All

SELECT '','Filter 2 - Size/Description ','32','2012-10-31' Union All

SELECT '','Filter 2 - Qty/Total','32','2012-10-31' Union All

SELECT '','Belt Size/Description','32','2012-10-31' Union All

SELECT '','Refrigerant Type','32','2012-10-31' Union All

SELECT '','Refrigerant Capacity','32','2012-10-31' Union All

SELECT '2/20X20X2','Filter 1 Size/Description','32','2012-10-31' Union All

SELECT '','Filter 2 Size/Description','32','2012-10-31' Union All

SELECT '1/A32','Belt Size/Description','32','2012-10-31' Union All

SELECT '','Refrigerant Type','32','2012-10-31' Union All

SELECT '','Refrigerant Capacity','32','2012-10-31' Union All

SELECT '','Nominal Tons','32','2012-10-31' Union All

SELECT '','Filter 1 - Size/Description','48','2012-10-31' Union All

SELECT '','Filter 1 - Qty/Total ','48','2012-10-31' Union All

SELECT '','Filter 2 - Size/Description ','48','2012-10-31' Union All

SELECT '','Filter 2 - Qty/Total','48','2012-10-31' Union All

SELECT '','Belt Size/Description','48','2012-10-31' Union All

SELECT '','Refrigerant Type','48','2012-10-31' Union All

SELECT '','Refrigerant Capacity','48','2012-10-31' Union All

SELECT '','Filter 1 Size/Description','48','2012-10-31' Union All

SELECT '','Filter 2 Size/Description','48','2012-10-31' Union All

SELECT '','Belt Size/Description','48','2012-10-31' Union All

SELECT '','Refrigerant Type','48','2012-10-31' Union All

SELECT '','Refrigerant Capacity','48','2012-10-31' Union All

SELECT '','Nominal Tons','48','2012-10-31' Union All

SELECT '','Filter 1 - Size/Description','64','2012-10-31'

Thanks to the Forum Etiquette guide here: http://www.sqlservercentral.com/articles/Best+Practices/61537/

Tava SSCarpal Tunnel Points: 4114 More actions · Answer 9

Jason-299789 (11/7/2012)
Tava, you can insert the same record twice with a merge statement, however you cannot UPDATE or DELETE the Same row twice.

Yes Sorry, I meant Update/Delete not Update/Insert...

SQL-Squid Default port Points: 1472 More actions · Answer 10

Tava (11/6/2012)
SQL-Squid,
You're correct in what you said the issue is that your source has duplicates, so based on the below you will always return multiple results, hence the MERGE statement returns that error.
MERGE [AssetClassUDF] AS target
USING [RawData_AssetClassUDF] AS source
ON target.ASSETDETAIL_Id = source.ASSETDETAIL_Id
AND target.ASSET_UDF_DESC = source.ASSET_UDF_DESC
Unfortunatley with the MERGE as the error says you cant update/insert the same record twice... so you only have 2 options.
1. use a GROUP BY in your MERGE so that way it will treat duplicates as One value & this should fix your problem (If this is what you want)
or
2. Clean the SOURCE so it doesn't contain duplicates.
I personally don't like Option 2 - Cleaning the source file as this is how it gets delivered and always good reference point when you want to compare RAW -> STAGING -> LIVE.

Tava,

Please give an example (if you have time) of how to use the group by in the merge. I am been trying to figure this out and I am still lost.

Right now, I am running an update then insert and a cte to remove dups. Its messy and I don't feel its the best way to handle this. I also have a trigger on delete that archives these updates to an archive table to have a full history. With the update & insert routine, I am now having to run the delete dups cte against the archive table. I'm not getting the warm fuzzy feeling this is a correct way to implement all this.

I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.

Tava SSCarpal Tunnel Points: 4114 More actions · Answer 11

SQL-Squid (11/7/2012)
Tava (11/6/2012)
SQL-Squid,
You're correct in what you said the issue is that your source has duplicates, so based on the below you will always return multiple results, hence the MERGE statement returns that error.
MERGE [AssetClassUDF] AS target
USING [RawData_AssetClassUDF] AS source
ON target.ASSETDETAIL_Id = source.ASSETDETAIL_Id
AND target.ASSET_UDF_DESC = source.ASSET_UDF_DESC
Unfortunatley with the MERGE as the error says you cant update/insert the same record twice... so you only have 2 options.
1. use a GROUP BY in your MERGE so that way it will treat duplicates as One value & this should fix your problem (If this is what you want)
or
2. Clean the SOURCE so it doesn't contain duplicates.
I personally don't like Option 2 - Cleaning the source file as this is how it gets delivered and always good reference point when you want to compare RAW -> STAGING -> LIVE.
Tava,
Please give an example (if you have time) of how to use the group by in the merge. I am been trying to figure this out and I am still lost.
Right now, I am running an update then insert and a cte to remove dups. Its messy and I don't feel its the best way to handle this. I also have a trigger on delete that archives these updates to an archive table to have a full history. With the update & insert routine, I am now having to run the delete dups cte against the archive table. I'm not getting the warm fuzzy feeling this is a correct way to implement all this.
I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.

I've tried using your code provided and everything you have done is correct... the GROUP BY is right its just the MERGE condition that is not exactly fitting your requirements - if you add that additional check then it will insert a new record as it won't match... The source file is a mess unfortunately and if you could get the primary key that would resolve it - in saying that seperate update/delete statements appear the way to go if its not possible.... you're best to wait for a more experienced developer as i've only started used MERGE myself - the source was terrible but i got them to fix that up with PK for me.

MERGE [AssetClassUDF] AS target

USING

(

SELECT

ASSET_UDF_VALUE,

ASSET_UDF_DESC,

ASSETDETAIL_ID

FROM

[RawData_AssetClassUDF]

GROUP BY

ASSET_UDF_VALUE,

ASSET_UDF_DESC,

ASSETDETAIL_ID

)AS source (ASSET_UDF_VALUE,ASSET_UDF_DESC,ASSETDETAIL_ID)

ON

target.ASSETDETAIL_Id = source.ASSETDETAIL_Id

AND

target.ASSET_UDF_DESC = source.ASSET_UDF_DESC

-- NEEDS THIS CONDITION BUT ITS NOT WHAT YOU REQUIRE OVERALL

--AND

-- target.ASSET_UDF_VALUE = source.ASSET_UDF_VALUE

Tava SSCarpal Tunnel Points: 4114 More actions · Answer 12

SQL-Squid (11/7/2012)

Tava (11/6/2012)
SQL-Squid,
I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.

To me it's beneficial to fix the main problem and thats the source file... If it's required for you to do what you need just have to ask and explain its importance otherwise there will be these issues ....

adding a PK on their extract will be simple for them unless for some reason its not allowed.

obviously if the source file can't be fixed - then MERGE might not be suited for you (my opinion) & this method you currently have is only way to go.

demonfox SSCertifiable Points: 6289 More actions · Answer 13

Tava (11/7/2012)

SQL-Squid (11/7/2012)
Tava (11/6/2012)
SQL-Squid,
I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.
To me it's beneficial to fix the main problem and thats the source file... If it's required for you to do what you need just have to ask and explain its importance otherwise there will be these issues ....
adding a PK on their extract will be simple for them unless for some reason its not allowed.
obviously if the source file can't be fixed - then MERGE might not be suited for you (my opinion) & this method you currently have is only way to go.

Uniqueness is required for the merge Updat/Delete ro work;

you better off with separate update ..

Or, if the duplicate row count is pretty less ; then you can filter those rows out of the merge statement and then update later only for these...or clean it before the update in the target; on the basis of design and requirement.

~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one :ermm:

Tava SSCarpal Tunnel Points: 4114 More actions · Answer 14

demonfox (11/7/2012)

Tava (11/7/2012)
SQL-Squid (11/7/2012)
Tava (11/6/2012)
SQL-Squid,
I was told I can request the app designer to include a primary key in the feed I receive, if that does happen, the merge will be a lot better, but if not I need to have something in place to work.
To me it's beneficial to fix the main problem and thats the source file... If it's required for you to do what you need just have to ask and explain its importance otherwise there will be these issues ....
adding a PK on their extract will be simple for them unless for some reason its not allowed.
obviously if the source file can't be fixed - then MERGE might not be suited for you (my opinion) & this method you currently have is only way to go.
Uniqueness is required for the merge Updat/Delete ro work;
you better off with separate update ..
Or, if the duplicate row count is pretty less ; then you can filter those rows out of the merge statement and then update later only for these...or clean it before the update in the target; on the basis of design and requirement.

I agree, trying to make a MERGE work where its purpose isnt designed for your requirements... stick to the seperate insert/update/delete.