Passing a table between Data Flows

Question

Passing a table between Data Flows

Dan Guzman - Not the MVP

Hall of Fame

Points: 3757
More actions
November 19, 2010 at 10:34 am

#147719

I have a ForEach Loop that loops through the files in a folder, importing each one, appending it to a table on the Server. after the loop I have a data flow task that does 3 Fuzzy Lookup matches, but to export it I seem to have to create another table.
How could I pass the table from the end of the loop into the data flow task as a source object? I see there is a Recordset Destination, how come there is no Recordset Source? Then I could perform the Fuzzy Lookups on the Source Recordset then pass the final result out to the db.
Right now I'm export the Fuzzy Lookup table as a table in the db, then performing a SQL Task that runs an update query to merge the new lookup info into the import table. It just seems like two unnecessary steps.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Koen Verbeeck SSC Guru Points: 259197 More actions · Answer 1

I don't really understand. You write some files to a table. And then you want to read from the table in a data flow task. Can't you just use an OLE DB Source? Or am I missing something here?

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Dan Guzman - Not the MVP Hall of Fame Points: 3757 More actions · Answer 2

My questions is more like, why do I have to create a temp table?

I have one dataflow that imports the data, then saves to a temp table.

Then another data flow that brings in the previous table to create a temp table because I can't do an update.

Then I run a SQL Task to run an update query between the first and second tables.

It seems like I should be able to pass the table from the import right into the second task. Why do I have to export to the db, to then just import it back in?

Koen Verbeeck SSC Guru Points: 259197 More actions · Answer 3

Where is that SQL Task with the update located? Between the two dataflows?

If yes, then your situation is normal 🙂

You can keep everything in one dataflow, but then you must somehow manage to do the update logic with the dataflow components. If this is not possible, then an update with the Execute SQL Task is the best option. Since it is set-based, it will run very fast.

Can't you move the update to the end of the package?

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Dan Guzman - Not the MVP Hall of Fame Points: 3757 More actions · Answer 4

da-zero (12/10/2010)
Where is that SQL Task with the update located? Between the two dataflows?
If yes, then your situation is normal 🙂
You can keep everything in one dataflow, but then you must somehow manage to do the update logic with the dataflow components. If this is not possible, then an update with the Execute SQL Task is the best option. Since it is set-based, it will run very fast.
Can't you move the update to the end of the package?

The SQL Task is at the end. How do I pass recordsets between the dataflows? its the 'somehow manage' part I'm trying to get more info on.

Phil Parkin SSC Guru Points: 247183 More actions · Answer 5

I've read through this thread three or four times and I still do not understand what you are trying to do.

A data flow takes data from one place, processes it and then puts it somewhere else. If you want to process data and put it in a file, you don't need a temp table.

If you want to process the data using an SQL UPDATE query, it's a different story. The data needs to be in a table which, of course, can easily be exported once the query has run using another data flow.

But your overly simplified description of what you are trying to do does not provide enough detail for us to be able to provide clear help.

Dan Guzman - Not the MVP Hall of Fame Points: 3757 More actions · Answer 6

I don't know how to restate the first post to be more clear.

here is a screen shot of the package. I hope this loads.

(Why is there no select file in this tool?)

As I stated in the original post, I have a for each loop container that loads all of the files in a folder into a table in the db. Can I instead load each file into 1 recordset?

Next control is an SQL task that clears a temp table, which won't be necessary if I could use a recordset.

Next is a flow control that uses the table from step 1, connecting as an OLE source (Here is were I would pass in the recordset from the ForEach loop). Performs three Fuzzy Lookups, then exports the key field and the three lookup fields to the temp table. The recordset would now be the complete Recordset from the foreach loop plus the new fields from the fuzzy lookups. and the export would be the whole thing to a final table in the db, and the next step would no longer be needed.

The current last step is a SQL Task that updates table 1 with the temp table.

I'm not sure this description is any better than the first.

BTW SQL 2005 sp3

Edit Note: The screenshot file didn't load

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 7

As I stated in the original post, I have a for each loop container that loads all of the files in a folder into a table in the db. Can I instead load each file into 1 recordset?

Wait, to rephrase, you're looking to do a repeated data-append to an ADO Recordset held in a variable, from a dataflow called via the primary package's loop?

You then want to take said recordset and feed it as a source into the fuzzy lookup structure.

I'm not sure this description is any better than the first.

Actually, much more effective. Btw, you need a webhosting site to drop that jpg up to. imageshack or the like.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Dan Guzman - Not the MVP Hall of Fame Points: 3757 More actions · Answer 8

Dan Guzman - Not the MVP

Hall of Fame

Points: 3757

December 10, 2010 at 11:50 am

#1261164

Yes, Craig, your descriptions does summarize it very well.

Phil Parkin SSC Guru Points: 247183 More actions · Answer 9

I agree with Craig: that post was much clearer.

I think perhaps you are thinking of SSIS in the wrong terms: SSIS gives you access to pipelines of data, rather than recordsets. The start of the pipeline can be a recordset and the end can be a file or database table. I imagine data streaming through the pipelines - you can't really do set-based operations on it (don't ask about the sort or merge join!).

Is there any reason why you cannot engineer the entire process, from picking up a row of data from your source, to inserting it into your target table, to be part of a dataflow (and therefore processed wholly in the pipeline)? Your FEL would run outside of this, so you would execute the pipeline for each file you want to process.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 10

Dan Guzman - Not the MVP (12/10/2010)
Yes, Craig, your descriptions does summarize it very well.

Well, there's the Recordset Destination available in the data flow, as you already mentioned. There's no way easy way to turn this back into a source, however.

AFAIK, you then need to use the Script Component in Data Flow Transformations to initiate a source, and then spew the resultant recordset out into the new dataflow to do your en-masse fuzzy lookup.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Dan Guzman - Not the MVP Hall of Fame Points: 3757 More actions · Answer 11

Phil Parkin (12/10/2010)
Is there any reason why you cannot engineer the entire process, from picking up a row of data from your source, to inserting it into your target table, to be part of a dataflow (and therefore processed wholly in the pipeline)? Your FEL would run outside of this, so you would execute the pipeline for each file you want to process.

You mean perhaps moving the fuzzy lookups to inside each import? hm, well the fuzzy lookup creates it's own index each time it runs, so it would slow things down considerable, and since we are talking about 50-100 loops, outside is better.

BTW, I do think of it as a stream. I just didn't expect dams, locks and fish ladders 😉

Phil Parkin SSC Guru Points: 247183 More actions · Answer 12

Dan Guzman - Not the MVP (12/10/2010)
Phil Parkin (12/10/2010)
Is there any reason why you cannot engineer the entire process, from picking up a row of data from your source, to inserting it into your target table, to be part of a dataflow (and therefore processed wholly in the pipeline)? Your FEL would run outside of this, so you would execute the pipeline for each file you want to process.
You mean perhaps moving the fuzzy lookups to inside each import? hm, well the fuzzy lookup creates it's own index each time it runs, so it would slow things down considerable, and since we are talking about 50-100 loops, outside is better.
BTW, I do think of it as a stream. I just didn't expect dams, locks and fish ladders 😉

🙂 Yes I did. I've never used the fuzzy lookup, so I'll take your word for that.

Bearing all that in mind, it seems to me like you may have already engineered the optimum solution, given the capabilities of SSIS.

One thing I should check: you refer to a 'temp' table. I assume you mean a staging table? Temp table is not the ideal mechanism, given the size of the job you are doing.

Dan Guzman - Not the MVP Hall of Fame Points: 3757 More actions · Answer 13

Phil Parkin (12/10/2010)
One thing I should check: you refer to a 'temp' table. I assume you mean a staging table? Temp table is not the ideal mechanism, given the size of the job you are doing.

Yes, I did mean a staging table. And considering today's QOTD, I doubt the context would allow #tmp table creation in one flow, then CRUD in another.

Thanks for the logic review anyway.

Dan

Koen Verbeeck SSC Guru Points: 259197 More actions · Answer 14

Dan Guzman - Not the MVP (12/10/2010)
Yes, I did mean a staging table. And considering today's QOTD, I doubt the context would allow #tmp table creation in one flow, then CRUD in another.

You can create temp tables in SSIS and persist them by setting the RetainSameConnection property on the connection manager to true. However, a real table in a staging database will probably be better.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP