SSIS unique records load and keep the duplicate records in another text file

  • Hi ,

    I have been given an task to load a text file to a database which is simple but the file may contain the duplicate records which are already present in the DB table. So i want to load the correct records and keep other records which are not loaded to table in another text file.

    Example:

    text file with source data:

    ID,Name

    1,john

    2,maya

    3,amir

    4,neha

    Database table:

    ID(primary key),name(varchar(50))

    101,sneha

    102,michael

    1,john

    4,neha

    Now i want the rows in text file with ID=2,3 get loaded to database table and new text file is created with data 1,john and 4,neha...

    Can you please help me in this .

  • You've got two options:

    (1) Load the whole file into a staging table and do the processing in the database engine, using SSIS to do the final export back out to the second test file; or

    (2) Use the transformations provided in the Data Flow to redirect the rows for you. I think a Merge Join and/or a Conditional Split may be what you're looking for, but I usually go for option 1 myself so I'm not all that familiar with those transformations.

    John

  • I'd do it like this:

    1) Inside the dataflow, add a lookup. Configure the lookup to redirect on no match (rather than failing).

    2) Use the lookup to check whether the input row already exists.

    3) Send the lookup's match output to a text file

    4) Send the lookup's no-match output to the table as usual.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Use the Lookup transformation to do such type of task. This scenerio is also called as Upsert Pattern.

    Thanks ,

    Sumit Joshi (Lead Engineer - HCL Technologies)

    Moblie:- 0-9650899699, Skype: sumit.joshij

    Blog: http://msbi2012.blogspot.in/

  • John Mitchell-245523 (3/28/2013)


    You've got two options:

    (1) Load the whole file into a staging table and do the processing in the database engine, using SSIS to do the final export back out to the second test file; or

    (2) Use the transformations provided in the Data Flow to redirect the rows for you. I think a Merge Join and/or a Conditional Split may be what you're looking for, but I usually go for option 1 myself so I'm not all that familiar with those transformations.

    John

    Stick with option 1 🙂

    The MERGE JOIN needs sorted data and since the source is a flat file you cannot sort the data at the source. Hence, you need to do it in the dataflow, leading to a blocking component.

    The typical scenario used is the one described by Phil, aka the Lookup component. At least, if the destination table isn't too large.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • sumit.joshij (3/28/2013)


    This scenerio is also called as Upsert Pattern.

    /

    Not really - as no updates are required here, I would call it a conditional INSERT.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (3/28/2013)


    ... I would call it a conditional INSERT.

    Patent pending...

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (3/28/2013)


    At least, if the destination table isn't too large.

    If there can be duplicate IDs in the source data file, it means the lookup has to be used in non-cached mode, which is also a performance killer. But in cached mode it should run quite fast, subject to the above condition.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Thanks every one!!! ...

  • Yes you are right.

    But I mean to say here that, in case of update and insert we will call it as "Upsert Pattern".

    Any ways Thanks. 🙂

    Thanks ,

    Sumit Joshi (Lead Engineer - HCL Technologies)

    Moblie:- 0-9650899699, Skype: sumit.joshij

    Blog: http://msbi2012.blogspot.in/

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply