Bulk loading data using For Loop container with variable number

  • Hi there

    I am developing an SSIS package to bulk load data. Now the data can be high in volume (300,000 to 2 million records)

    What i want to design is a package that chunks up high volumes of data into chunks of 100,000 records .

    I would use a for loop container to process the data.

    So for example, the container would be executed 3 times for 300,000 records

    I have a question though. How would i be able to programmatically determine the  number of times that I must

    run this For Loop Container?

    For example, i'm planning to load 380,000 records.

    My thoughts were that i can capture the total number of records im  loading into a staging table at the beginning of the task

    using the RowNumber transformation and write that to a variable

    Then how would i calculate the number of loops?

    I thought it would be something like [TotalNumberOfRecords]  MOD 100000?

    Also when i write this value to a SSIS variable, how do I then set this in the field for number of times that the For Loop container must execute?

     

     

  • Just thought I’d throw this out there….

    If your input file is a .csv, an approach that might be simpler to debug would be to do the chunking and importing in separate steps.

    Chunk the input file into smaller files in a prepass using a script task, possibly into a different folder. I found scripted read-a-record/write-a-record processes to be fairly fast, though I have not worked with input files as large as yours.   Start with a sequence number, e.g. 1, append it to the original file name, and open it as a Streamwriter.   Increment a record counter with every read/write.  When the record counter MOD 100,000 is zero, close the first file, increment the sequence number, open a new file using the new sequence number as the file name suffix, and resume reading/writing.

    After the smaller files have been created, they could be processed using a For Each File loop.

  • JimReeves71 wrote:

    Hi there

    I am developing an SSIS package to bulk load data. Now the data can be high in volume (300,000 to 2 million records)

    What i want to design is a package that chunks up high volumes of data into chunks of 100,000 records .

    I would use a for loop container to process the data.

    So for example, the container would be executed 3 times for 300,000 records

    I have a question though. How would i be able to programmatically determine the  number of times that I must

    run this For Loop Container?

    For example, i'm planning to load 380,000 records.

    My thoughts were that i can capture the total number of records im  loading into a staging table at the beginning of the task

    using the RowNumber transformation and write that to a variable

    Then how would i calculate the number of loops?

    I thought it would be something like [TotalNumberOfRecords]  MOD 100000?

    Also when i write this value to a SSIS variable, how do I then set this in the field for number of times that the For Loop container must execute?

    To be honest, I don't know why people "chunk" 2 million rows.  Done correctly, you can bulk insert into an empty table with the Clustered Index already in place and still take advantage of Minimal Logging, which makes the imports to a staging table at least twice as fast and sometimes a whole lot more.  Unless your machine is extremely cramped, you should be able to import 2 million rows in just a minute or two.  Seriously.

    Also, chunking destroys minimal logging.  Only the first insert into an empty staging table will be truly minimally logged.  All following inserts, even in the face of use Trace Flag 610 (or a later version of SQL Server where it's "built in"), will not actually be minimally logged despite what the white paper on the subject says about it.  It just doesn't happen.

    And, yes, I'd always import to a staging table rather than the final table directly.  It's a whole lot safer overall and will allow you to "pre-validate" and "pre-condition" any data you need to.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JimReeves71 wrote:

    Then how would i calculate the number of loops?

    I thought it would be something like [TotalNumberOfRecords]  MOD 100000?

    Also when i write this value to a SSIS variable, how do I then set this in the field for number of times that the For Loop container must execute? 

    Couldn't you just do it until the number of rows inserted (@@ROWCOUNT) is less than the batch size?

    But I've got an SSIS data warehouse import that inserts over 160 million rows into a staging table with one insert so I'm not sure why you want to do it in batches for a table with 2 million rows?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply