SSIS Ignore Dupes

Question

Post reply

SSIS Ignore Dupes

bruce.b.allen

SSC-Addicted

Points: 438
More actions
January 13, 2011 at 8:39 am

#149651

I apologize in advance as I'm sure this has been answered before. I did some searching and couldn't quite get exactly what I was looking for (or am to stubborn to try cause I think I'm on the right path).
I currently have a Data Flow task that loads a file to a database. The basic flow is:
2 Source Files -> 2 Derived Columns -> Union All -> String Functions -> Data Conversion ->Destination SqlServer table.
The source file contains duplicate records from the destination file. What I would like to do is ignore the row if the Primary Key already exists.
What I've done is set the Configure Output Error in the Data Conversion Task to Ignore Failure on the Primarey Key (tx_number) in this case.
I still get the 'Violation of PRIMARY KEY constraint' error when doing this.
I feel like I'm pretty close and rather than spending another day I thought I'd post here because I've had success in this forum in the past.
Thanks in advance.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

steveb. SSC-Forever Points: 46733 More actions · Answer 1

you can add a sort transformation and check the box to remove duplicates based upon the sort key,

or you could put the data as is in a staging table and then read from this table using a query that removes duplicates

bruce.b.allen SSC-Addicted Points: 438 More actions · Answer 2

steveb. (1/13/2011)
you can add a sort transformation and check the box to remove duplicates based upon the sort key,
or you could put the data as is in a staging table and then read from this table using a query that removes duplicates

I should mention that the source file does not contain duplicates, rather it may contain records that are already in the destination table. I think sorting just ignores dupes in the source file. I'm trying to avoid using a staging table if possible becuase of the number of changes I'd have to make.

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 3

bruce.b.allen (1/13/2011)
steveb. (1/13/2011)
you can add a sort transformation and check the box to remove duplicates based upon the sort key,
or you could put the data as is in a staging table and then read from this table using a query that removes duplicates
I should mention that the source file does not contain duplicates, rather it may contain records that are already in the destination table. I think sorting just ignores dupes in the source file. I'm trying to avoid using a staging table if possible becuase of the number of changes I'd have to make.

In that case, you could use a simple lookup component to check if the PK already exists. When there is a match, the column of the destination PK will be filled in. If there is no duplicate, the destination PK will be NULL. After the lookup component, set up a conditional split that drops all rows where the destination PK is different from NULL.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Gustav_99 SSC Rookie Points: 33 More actions · Answer 4

steveb. (1/13/2011)
you can add a sort transformation and check the box to remove duplicates based upon the sort key,
or you could put the data as is in a staging table and then read from this table using a query that removes duplicates

This worked perfectly, Thanks!

Phil Parkin SSC Guru Points: 246695 More actions · Answer 5

Gustav_99 (12/3/2013)
steveb. (1/13/2011)
you can add a sort transformation and check the box to remove duplicates based upon the sort key,
or you could put the data as is in a staging table and then read from this table using a query that removes duplicates
This worked perfectly, Thanks!

Note that the second option is likely to be an order of magnitude faster than the first. The sort transformation is notoriously slow.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.