SSIS unique records load and keep the duplicate records in another text file

Question

SSIS unique records load and keep the duplicate records in another text file

preetpalkapoor

SSC-Addicted

Points: 457
More actions
March 28, 2013 at 1:59 am

#291813

Hi ,
I have been given an task to load a text file to a database which is simple but the file may contain the duplicate records which are already present in the DB table. So i want to load the correct records and keep other records which are not loaded to table in another text file.
Example:
text file with source data:
ID,Name
1,john
2,maya
3,amir
4,neha
Database table:
ID(primary key),name(varchar(50))
101,sneha
102,michael
1,john
4,neha
Now i want the rows in text file with ID=2,3 get loaded to database table and new text file is created with data 1,john and 4,neha...
Can you please help me in this .

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

You've got two options:

(1) Load the whole file into a staging table and do the processing in the database engine, using SSIS to do the final export back out to the second test file; or

(2) Use the transformations provided in the Data Flow to redirect the rows for you. I think a Merge Join and/or a Conditional Split may be what you're looking for, but I usually go for option 1 myself so I'm not all that familiar with those transformations.

John

Phil Parkin SSC Guru Points: 246695 More actions · Answer 2

I'd do it like this:

1) Inside the dataflow, add a lookup. Configure the lookup to redirect on no match (rather than failing).

2) Use the lookup to check whether the input row already exists.

3) Send the lookup's match output to a text file

4) Send the lookup's no-match output to the table as usual.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

sumit.joshij Ten Centuries Points: 1040 More actions · Answer 3

Use the Lookup transformation to do such type of task. This scenerio is also called as Upsert Pattern.

Thanks ,

Sumit Joshi (Lead Engineer - HCL Technologies)

Moblie:- 0-9650899699, Skype: sumit.joshij

Blog: http://msbi2012.blogspot.in/

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 4

John Mitchell-245523 (3/28/2013)
You've got two options:
(1) Load the whole file into a staging table and do the processing in the database engine, using SSIS to do the final export back out to the second test file; or
(2) Use the transformations provided in the Data Flow to redirect the rows for you. I think a Merge Join and/or a Conditional Split may be what you're looking for, but I usually go for option 1 myself so I'm not all that familiar with those transformations.
John

Stick with option 1 🙂

The MERGE JOIN needs sorted data and since the source is a flat file you cannot sort the data at the source. Hence, you need to do it in the dataflow, leading to a blocking component.

The typical scenario used is the one described by Phil, aka the Lookup component. At least, if the destination table isn't too large.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Phil Parkin SSC Guru Points: 246695 More actions · Answer 5

sumit.joshij (3/28/2013)
This scenerio is also called as Upsert Pattern.
/

Not really - as no updates are required here, I would call it a conditional INSERT.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 6

Phil Parkin (3/28/2013)
... I would call it a conditional INSERT.

Patent pending...

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Phil Parkin SSC Guru Points: 246695 More actions · Answer 7

Koen Verbeeck (3/28/2013)
At least, if the destination table isn't too large.

If there can be duplicate IDs in the source data file, it means the lookup has to be used in non-cached mode, which is also a performance killer. But in cached mode it should run quite fast, subject to the above condition.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

preetpalkapoor SSC-Addicted Points: 457 More actions · Answer 8

preetpalkapoor

SSC-Addicted

Points: 457

March 28, 2013 at 4:25 am

#1601534

Thanks every one!!! ...

sumit.joshij Ten Centuries Points: 1040 More actions · Answer 9

Yes you are right.

But I mean to say here that, in case of update and insert we will call it as "Upsert Pattern".

Any ways Thanks. 🙂

Thanks ,

Sumit Joshi (Lead Engineer - HCL Technologies)

Moblie:- 0-9650899699, Skype: sumit.joshij

Blog: http://msbi2012.blogspot.in/