SSIS Import only new values from a CSV without using a staging table

Question

SSIS Import only new values from a CSV without using a staging table

datsun

SSCrazy

Points: 2397
More actions
May 17, 2021 at 12:14 pm
Go to Answer
#3884349

Hello,
How can I import a CSV file in SSIS (2016) at the same comparing a value in the destination table if the record is new?
The CSV file is a continuous record-appending file.
I want to avoid a staging table if possible.
Thank you,
Vinay
datsun

SSCrazy

Points: 2397
More actions
May 17, 2021 at 6:19 pm
Answer
#3884518

Thank you all for your inputs.
I have resolved the issue using Merge Join (Left Outer join) with Conditional Split (for column = NULL)
which produces exactly the new records I need to insert back in the target table.
V

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 1

Yep... you can avoid a staging table if you'd like but... if something goes wrong, there's going to be a long rollback and your target table might be close to useless during such a roll back. Are you sure you want to take such a chance with unknown non-prevalidated data?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

datsun SSCrazy Points: 2397 More actions · Answer 2

Thank you Jeff,

The target table is to be used for only one processing and won't be used by any other users at all. it's like a collection bucket.

That's why I don't want to insert into there records already processed (duplication).

I think having two tables for such a small task is over the kill, hence I need to validate the records during SSIS stage prior to inserting new records only.

There is nothing in the Data transform stage to compare with a table.

Vinay

Phil Parkin SSC Guru Points: 247225 More actions · Answer 3

What do you mean by 'continuous record-appending file'? Is this a file which just keeps on growing forever? I'd be looking quite vigorously for a way to avoid that, because you don't want to be needlessly processing the same data over and over again.

If you are looking for a way in SSIS to do INSERTs of new rows (but no updates or deletes), and assuming you have a suitable unique key in place, in both the file and the target table, this is easy to do. Use a Conditional Split within your data flow, matching on the unique key. Send the 'unmatched' output to your target table and ignore the 'matched' output.

datsun SSCrazy Points: 2397 More actions · Answer 4

Thank you Phil,

The CSV file is out my control. It will be a growing file, yes but limited to few records once a month. It won't grow too big (not this year or next).

I will try Conditional Split. Is there a way to then compare the incoming file source with the target table for unique values?

Vinay

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 5

datsun wrote:

Thank you Jeff,
The target table is to be used for only one processing and won't be used by any other users at all. it's like a collection bucket.
That's why I don't want to insert into there records already processed (duplication).
I think having two tables for such a small task is over the kill, hence I need to validate the records during SSIS stage prior to inserting new records only.
There is nothing in the Data transform stage to compare with a table.
Vinay

Are you saying that the source file contains all the rows (new rows and old rows and is frequently updated with new rows) all the time and that you have to keep all of the old rows in the table forever and that's why you just want to load new rows into the table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

datsun SSCrazy Points: 2397 More actions · Answer 6

Are you saying that the source file contains all the rows (new rows and old rows and is frequently updated with new rows) all the time and that you have to keep all of the old rows in the table forever and that's why you just want to load new rows into the table?

Yes, in summary that's right. The destination file will have all the records, the CSV file will have the same + new ones appended.

Vinay

Phil Parkin SSC Guru Points: 247225 More actions · Answer 7

datsun wrote:

I will try Conditional Split. Is there a way to then compare the incoming file source with the target table for unique values?
Vinay

What do you mean by 'unique values'?

The CS allows you to match rows in your source file against rows in the target table. I am assuming that you have a unique key in both places?

datsun SSCrazy Points: 2397 More actions · Answer 8

Phil,

Yes, there is a unique non-numeric key (username). Can the conditional split be used against one field to check for duplication?

V

Phil Parkin SSC Guru Points: 247225 More actions · Answer 9

datsun wrote:

Phil,
Yes, there is a unique non-numeric key (username). Can the conditional split be used against one field to check for duplication?
V

If the key is unique, how can there possibly be any duplication? Please provide an (anonymised) example.

datsun SSCrazy Points: 2397 More actions · Answer 10

Phil,

Yes, there won't be any duplication in the file within itself because of the unique values(username).

However, regular importing the same CSV file with older data and new data, will make duplication in the target table.

Which is what I want to avoid (using runtime check).

Vinay

Phil Parkin SSC Guru Points: 247225 More actions · Answer 11

datsun wrote:

Phil,
Yes, there won't be any duplication in the file within itself because of the unique values(username).
However, regular importing the same CSV file with older data and new data, will make duplication in the target table.
Which is what I want to avoid (using runtime check).
Vinay

'Checking for duplication' and 'avoiding duplication' are different things. The first suggests that duplicates may already exist, unlike the second.

I did, however, make a mistake when recommending the CS for this task. I meant the Lookup, my apologies. This will probably perform significantly faster than the JOIN. Configure the Lookup to use

SELECT <unique key> from TargetTable

after selecting 'Use results of an SQL query' in the Connection node.

Use FULL caching (cache mode), if you have sufficient memory.

and then match the unique key of the incoming rows against it, sending the unmatched rows to the target table.