Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

SSIS unique records load and keep the duplicate records in another text file Expand / Collapse
Author
Message
Posted Thursday, March 28, 2013 1:59 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, August 30, 2013 7:32 AM
Points: 24, Visits: 32
Hi ,

I have been given an task to load a text file to a database which is simple but the file may contain the duplicate records which are already present in the DB table. So i want to load the correct records and keep other records which are not loaded to table in another text file.

Example:
text file with source data:
ID,Name
1,john
2,maya
3,amir
4,neha

Database table:
ID(primary key),name(varchar(50))
101,sneha
102,michael
1,john
4,neha

Now i want the rows in text file with ID=2,3 get loaded to database table and new text file is created with data 1,john and 4,neha...

Can you please help me in this .
Post #1436297
Posted Thursday, March 28, 2013 2:31 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:58 AM
Points: 5,077, Visits: 8,919
You've got two options:

(1) Load the whole file into a staging table and do the processing in the database engine, using SSIS to do the final export back out to the second test file; or
(2) Use the transformations provided in the Data Flow to redirect the rows for you. I think a Merge Join and/or a Conditional Split may be what you're looking for, but I usually go for option 1 myself so I'm not all that familiar with those transformations.

John
Post #1436311
Posted Thursday, March 28, 2013 2:59 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 8:23 PM
Points: 4,832, Visits: 11,197
I'd do it like this:

1) Inside the dataflow, add a lookup. Configure the lookup to redirect on no match (rather than failing).
2) Use the lookup to check whether the input row already exists.
3) Send the lookup's match output to a text file
4) Send the lookup's no-match output to the table as usual.




Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1436321
Posted Thursday, March 28, 2013 3:32 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, August 15, 2013 11:49 AM
Points: 48, Visits: 269
Use the Lookup transformation to do such type of task. This scenerio is also called as Upsert Pattern.

Thanks ,
Sumit Joshi (Lead Engineer - HCL Technologies)
Moblie:- 0-9650899699, Skype: sumit.joshij
Blog: http://msbi2012.blogspot.in/
Post #1436331
Posted Thursday, March 28, 2013 3:48 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:35 AM
Points: 12,227, Visits: 9,201
John Mitchell-245523 (3/28/2013)
You've got two options:

(1) Load the whole file into a staging table and do the processing in the database engine, using SSIS to do the final export back out to the second test file; or
(2) Use the transformations provided in the Data Flow to redirect the rows for you. I think a Merge Join and/or a Conditional Split may be what you're looking for, but I usually go for option 1 myself so I'm not all that familiar with those transformations.

John


Stick with option 1
The MERGE JOIN needs sorted data and since the source is a flat file you cannot sort the data at the source. Hence, you need to do it in the dataflow, leading to a blocking component.

The typical scenario used is the one described by Phil, aka the Lookup component. At least, if the destination table isn't too large.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1436346
Posted Thursday, March 28, 2013 4:04 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 8:23 PM
Points: 4,832, Visits: 11,197
sumit.joshij (3/28/2013)


This scenerio is also called as Upsert Pattern.


/


Not really - as no updates are required here, I would call it a conditional INSERT.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1436360
Posted Thursday, March 28, 2013 4:08 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:35 AM
Points: 12,227, Visits: 9,201
Phil Parkin (3/28/2013)


... I would call it a conditional INSERT.


Patent pending...




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1436363
Posted Thursday, March 28, 2013 4:19 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 8:23 PM
Points: 4,832, Visits: 11,197
Koen Verbeeck (3/28/2013)


At least, if the destination table isn't too large.


If there can be duplicate IDs in the source data file, it means the lookup has to be used in non-cached mode, which is also a performance killer. But in cached mode it should run quite fast, subject to the above condition.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1436366
Posted Thursday, March 28, 2013 4:25 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, August 30, 2013 7:32 AM
Points: 24, Visits: 32
Thanks every one!!! ...
Post #1436369
Posted Thursday, March 28, 2013 5:39 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, August 15, 2013 11:49 AM
Points: 48, Visits: 269
Yes you are right.
But I mean to say here that, in case of update and insert we will call it as "Upsert Pattern".
Any ways Thanks.


Thanks ,
Sumit Joshi (Lead Engineer - HCL Technologies)
Moblie:- 0-9650899699, Skype: sumit.joshij
Blog: http://msbi2012.blogspot.in/
Post #1436406
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse