SSIS ForEach Loop Container recursion question

Question

SSIS ForEach Loop Container recursion question

Brandie Tarvin

SSC Guru

Points: 173108
More actions
May 12, 2017 at 5:00 am

#330229

I have a weird one today. Because my destination system can only accept files of a certain size, I need to dynamically create multiple files for a single day's feed. This means I need to export X number of rows per file, but the issue is each time I run this I don't know the total number of rows that will be in the table I'm creating my files from. So the number of files created will be dynamic based on the row count.
In this case, let's say the maximum row count per file is 750,000 (.csv file, not .xlsx). One day the table has 2 million rows, but another day it has 5 million. Or perhaps 1.5 million. The idea is that the ForEach Loop container will run 3 times on day 1 (creating 3 files) while running 7 times for the 5 million rows or 2 times for the 1.5 million rows. I've got the EXECUTE T-SQL step right before the container to pull the number of files needed to a variable:
SELECT (COUNT(*)/750000) AS FileCount FROM dbo.MyTable;
The above pushes to an @FileCount variable in the package. I was going to use this as the Foreach variable enumerator, but it wants a string pointing to an ADO object. As does the ADO enumerator. None of the other enumerators seem to be what I'm looking for.
Maybe I'm doing this wrong, but I built the table with a bit column to tell me when certain rows have been exported (meaning I update the column from 0 to 1 when I push them to one of the files and then go to grab the next set of 750,000 set at 0). Unfortunately, I can't figure out how to translate that schema over to the Foreach Loop Container to force it to run dynamically based on number of result sets.
Any thoughts on how I can accomplish this?
Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 1

Quick thought, use a row number or an id number, divide by the max row size and wrap this in either a view or a inline table valued function. Then you can pass each iteration number as a parameter.
😎

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 2

One possible solution (perhaps not the best) would be to have a query return one row for each iteration needed (using a numbers table, of course), then use the ADO Enumerator for the ForEach container

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Brandie Tarvin SSC Guru Points: 173108 More actions · Answer 3

Thanks, Guys. I'm going to watch this thread for a little bit more (I want to hear what Thom has to say) before digging back into this. I've been banging my head against the proverbial wall, so need to step away and let these suggestions percolate (if you know what I mean).

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 4

Eirikur Eiriksson

SSC Guru

Points: 182951

May 12, 2017 at 5:28 am

#1942302

Are you exporting the data from a table?
😎

Phil Parkin SSC Guru Points: 247182 More actions · Answer 5

If you know the absolute maximum number of rows in the source, you could perhaps add an NTILE to your source query & then have a conditional split which sends the various streams to different file destinations.
Or here is a pure C# solution which could easily be adapted to work in SSIS.

Thom A SSC Guru Points: 99143 More actions · Answer 6

I was actually planning to use a Script Component and an ADO variable as the datasource, however, when I get to the second iteration I get no results from the variable. it's as if SSIS drops the contents after you read it the first time... Little confused. I'm trying to trouble shoot it some more.

EDIT: So, apparently, using fill empties the SSIS Variable as the variable in C# is filled . Which means, when I do my second iteration, it's then empty. D'oh!. >_<

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

FridayNightGiant SSCertifiable Points: 7151 More actions · Answer 7

I've done something similar in the past, here is some sample code I mocked up

public void Main() { try { var strQuery = "Select col1, col2 from dbo.TableName";

var fileName = ""; var recordCount = 0; var fileCount = 0; var recordNumberPerFile = 10000;

//create connection using a connection manager in the package SqlConnection conn = (SqlConnection)Dts.Connections["Database"].AcquireConnection(null); SqlCommand cmd = new SqlCommand(strQuery, conn); SqlDataReader reader = cmd.ExecuteReader(); StreamWriter writer = null;

while (reader.Read()) { //if no write or we have reached the record number if (writer == null || recordCount == recordNumberPerFile) { recordCount = 0;

// Close the previous file if it is open... if (writer != null) { writer.Close(); writer.Dispose(); }

fileName = "c:\\edi\\FileName_" + (++fileCount).ToString() + ".txt"; // Open the new file... writer = new StreamWriter(fileName); } //populate each row into the writer string recordSet = reader.GetValue(0).ToString() + reader.GetValue(1).ToString(); writer.WriteLine(recordSet); // recordInfo is sudo code as well

recordCount++; }

conn.Close(); Dts.TaskResult = (int)ScriptResults.Success; } catch (Exception e) { Dts.Events.FireError(0, "Script Task Example", e.Message, String.Empty, 0); } }

Brandie Tarvin SSC Guru Points: 173108 More actions · Answer 8

Eirikur Eiriksson - Friday, May 12, 2017 5:28 AM
Are you exporting the data from a table?
😎

Yes. I've got a table with 25+ columns in it and the final column is ExportedToFile BIT NOT NULL (set as 0 meaning "not yet exported"). The idea is to skim the top 750000 records off that, update them in the data flow as I'm pushing them to the file (Multicast is a wonderful thing) and then grab the next 750,000 for the next file to do the same thing.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Brandie Tarvin SSC Guru Points: 173108 More actions · Answer 9

Phil Parkin - Friday, May 12, 2017 5:47 AM
If you know the absolute maximum number of rows in the source, you could perhaps add an NTILE to your source query & then have a conditional split which sends the various streams to different file destinations.
Or here is a pure C# solution which could easily be adapted to work in SSIS.

That doesn't help. I never know until the code is run how many rows will be in the table and that number will not be consistent. Which means I can't have different file destinations because that number will hardly ever be consistent.

EDIT: grammar fix
EDIT 2: Missed the C# link. Opening that up now. Thank you for the reference.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Brandie Tarvin SSC Guru Points: 173108 More actions · Answer 10

FridayNightGiant - Friday, May 12, 2017 11:33 AM
I've done something similar in the past, here is some sample code I mocked up
       public void Main()         {     try     {       var strQuery = "Select col1, col2 from dbo.TableName";
      var fileName = "";       var recordCount = 0;       var fileCount = 0;       var recordNumberPerFile = 10000;
      //create connection using a connection manager in the package       SqlConnection conn = (SqlConnection)Dts.Connections["Database"].AcquireConnection(null);       SqlCommand cmd = new SqlCommand(strQuery, conn);       SqlDataReader reader = cmd.ExecuteReader();             StreamWriter writer = null;
      while (reader.Read())       {        //if no write or we have reached the record number        if (writer == null || recordCount == recordNumberPerFile)        {         recordCount = 0;
        // Close the previous file if it is open...         if (writer != null)         {           writer.Close();           writer.Dispose();         }
        fileName = "c:\\edi\\FileName_" + (++fileCount).ToString() + ".txt";                  // Open the new file...         writer = new StreamWriter(fileName);        }        //populate each row into the writer        string recordSet = reader.GetValue(0).ToString() + reader.GetValue(1).ToString();        writer.WriteLine(recordSet); // recordInfo is sudo code as well
       recordCount++;       }
      conn.Close();       Dts.TaskResult = (int)ScriptResults.Success;     }     catch (Exception e)     {       Dts.Events.FireError(0, "Script Task Example", e.Message, String.Empty, 0);     }         }

Hrm. I did not think about using a script for this process. Thank you. I'll look at this code and let you know how it works out for me.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Brandie Tarvin SSC Guru Points: 173108 More actions · Answer 11

Thom A - Friday, May 12, 2017 7:21 AM
I was actually planning to use a Script Component and an ADO variable as the datasource, however, when I get to the second iteration I get no results from the variable. it's as if SSIS drops the contents after you read it the first time... Little confused. I'm trying to trouble shoot it some more.
EDIT: So, apparently, using fill empties the SSIS Variable as the variable in C# is filled . Which means, when I do my second iteration, it's then empty. D'oh!. >_<

Thanks for looking, Thom. I was actually having a similar problem, trying to figure out how to loop back to an EXECUTE SQL Task that was outside the loop and providing a number of rows. I thought about nesting containers, but realized I still didn't have a good way of kicking off the outermost one, etc. and that's when I came here looking for additional guidance.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Phil Parkin SSC Guru Points: 247182 More actions · Answer 12

Brandie Tarvin - Friday, May 12, 2017 12:00 PM
Phil Parkin - Friday, May 12, 2017 5:47 AM
If you know the absolute maximum number of rows in the source, you could perhaps add an NTILE to your source query & then have a conditional split which sends the various streams to different file destinations.
Or here is a pure C# solution which could easily be adapted to work in SSIS.
That doesn't help. I never know until the code is run how many rows will be in the table and that number will not be consistent. Which means I can't have different file destinations because that number will hardly ever be consistent.
EDIT: grammar fix
EDIT 2: Missed the C# link. Opening that up now. Thank you for the reference.

Just to clarify, consistency is not a prerequisite for my suggestion. Only that you know the absolute maximum number of rows that will ever be exported.
For example, if you know that you'll never have more than 7.5 million rows, and your batch size is 750,000, you'd need to create 10 flat file destinations.

Eg, if 1.5 million rows, you'd run an NTILE(2), breaking the data into two 750k groups.
Your conditional split would fire these rows to outputs 1 and 2, leaving the other 8 destinations unused.

The C# idea is clearly more flexible, but this solution may be able to use parallelism & therefore execute more quickly.

Brandie Tarvin SSC Guru Points: 173108 More actions · Answer 13

Phil Parkin - Friday, May 12, 2017 12:13 PM
Just to clarify, consistency is not a prerequisite for my suggestion. Only that you know the absolute maximum number of rows that will ever be exported.
For example, if you know that you'll never have more than 7.5 million rows, and your batch size is 750,000, you'd need to create 10 flat file destinations.

And as I'm trying to explain, there never will be an absolute maximum guarantee. Our business isn't consistent enough for me or anyone else to be able to predict it. Things slow down when there are recessions or depressions. Things increase as people get more disposable income or more small businesses open up and start doing business with us. My company would like to become an industry leader, which means doing business with more and more people. So I can only guess the data we are tracking will continue to grow, with the assumption that sometimes it won't.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Phil Parkin SSC Guru Points: 247182 More actions · Answer 14

Brandie Tarvin - Monday, May 15, 2017 8:10 AM
Phil Parkin - Friday, May 12, 2017 12:13 PM
Just to clarify, consistency is not a prerequisite for my suggestion. Only that you know the absolute maximum number of rows that will ever be exported.
For example, if you know that you'll never have more than 7.5 million rows, and your batch size is 750,000, you'd need to create 10 flat file destinations.
And as I'm trying to explain, there never will be an absolute maximum guarantee. Our business isn't consistent enough for me or anyone else to be able to predict it. Things slow down when there are recessions or depressions. Things increase as people get more disposable income or more small businesses open up and start doing business with us. My company would like to become an industry leader, which means doing business with more and more people. So I can only guess the data we are tracking will continue to grow, with the assumption that sometimes it won't.

OK, got it. Then you need to find a dynamic way. The C# scripting method would work and is probably the simplest.
To achieve parallelism, it should be possible to create a Parent/Child package structure with as many occurrences of the child package as threads you wish to run in parallel.
The master package would have to pass things like target filename and source data range start/end to the child package.