SSIS Import only new values from a CSV without using a staging table

  • Hello,

    How can I import a CSV file in SSIS (2016) at the same comparing a value in the destination table if the record is new?

    The CSV file is a continuous record-appending file.

    I want to avoid a staging table if possible.

    Thank you,

    Vinay

  • Yep... you can avoid a staging table if you'd like but... if something goes wrong, there's going to be a long rollback and your target table might be close to useless during such a roll back.  Are you sure you want to take such a chance with unknown non-prevalidated data?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Jeff,

    The target table is to be used for only one processing and won't be used by any other users at all. it's like a collection bucket.

    That's why I don't want to insert into there records already processed (duplication).

    I think having two tables for such a small task is over the kill, hence I need to validate the records during SSIS stage prior to inserting new records only.

    There is nothing in the Data transform stage to compare with a table.

    Vinay

  • What do you mean by 'continuous record-appending file'? Is this a file which just keeps on growing forever? I'd be looking quite vigorously for a way to avoid that, because you don't want to be needlessly processing the same data over and over again.

    If you are looking for a way in SSIS to do INSERTs of new rows (but no updates or deletes), and assuming you have a suitable unique key in place, in both the file and the target table, this is easy to do. Use a Conditional Split within your data flow, matching on the unique key. Send the 'unmatched' output to your target table and ignore the 'matched' output.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Thank you Phil,

     

    The CSV file is out my control. It will be a growing file, yes but limited to few records once a month. It won't grow too big (not this year or next).

    I will try Conditional Split. Is there a way to then compare the incoming file source with the target table for unique values?

    Vinay

  • datsun wrote:

    Thank you Jeff,

    The target table is to be used for only one processing and won't be used by any other users at all. it's like a collection bucket.

    That's why I don't want to insert into there records already processed (duplication).

    I think having two tables for such a small task is over the kill, hence I need to validate the records during SSIS stage prior to inserting new records only.

    There is nothing in the Data transform stage to compare with a table.

    Vinay

    Are you saying that the source file contains all the rows (new rows and old rows and is frequently updated with new rows) all the time and that you have to keep all of the old rows in the table forever and that's why you just want to load new rows into the table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Are you saying that the source file contains all the rows (new rows and old rows and is frequently updated with new rows) all the time and that you have to keep all of the old rows in the table forever and that's why you just want to load new rows into the table?

    Yes, in summary that's right. The destination file will have all the records, the CSV file will have the same + new ones appended.

    Vinay

  • datsun wrote:

    I will try Conditional Split. Is there a way to then compare the incoming file source with the target table for unique values?

    Vinay

    What do you mean by 'unique values'?

    The CS allows you to match rows in your source file against rows in the target table. I am assuming that you have a unique key in both places?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil,

    Yes, there is a unique non-numeric key (username). Can the conditional split be used against one field to check for duplication?

     

    V

  • datsun wrote:

    Phil,

    Yes, there is a unique non-numeric key (username). Can the conditional split be used against one field to check for duplication?

    V

    If the key is unique, how can there possibly be any duplication? Please provide an (anonymised) example.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil,

    Yes, there won't be any duplication in the file within itself because of the unique values(username).

    However, regular importing the same CSV file with older data and new data, will make duplication in the target table.

    Which is what I want to avoid (using runtime check).

    Vinay

  • Thank you all for your inputs.

    I have resolved the issue using Merge Join (Left Outer join) with Conditional Split (for column = NULL)

    which produces exactly the new records I need to insert back in the target table.

    V

  • datsun wrote:

    Phil,

    Yes, there won't be any duplication in the file within itself because of the unique values(username).

    However, regular importing the same CSV file with older data and new data, will make duplication in the target table.

    Which is what I want to avoid (using runtime check).

    Vinay

    'Checking for duplication' and 'avoiding duplication' are different things. The first suggests that duplicates may already exist, unlike the second.

    I did, however, make a mistake when recommending the CS for this task. I meant the Lookup, my apologies. This will probably perform significantly faster than the JOIN. Configure the Lookup to use

    SELECT <unique key> from TargetTable

    after selecting 'Use results of an SQL query' in the Connection node.

    Use FULL caching (cache mode), if you have sufficient memory.

    and then match the unique key of the incoming rows against it, sending the unmatched rows to the target table.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply