Importing Files in Parallel with SSIS

  • Comments posted to this topic are about the item Importing Files in Parallel with SSIS

  • Great article!
    One question: How do you decide how many instances are required?

  • Hi Stan
    It would depend on how many threads can run on the processors your server has, but you'd need to test it out on your individual environment.
    The purpose is to make higher use of the CPU.
    There is a point at which too many threads will start to slow the process down so you just need to try different amounts until you find the right number.
    Hope this helps

  • IS there any way you can upload the solution or any link where we can download this? Have some confusion how to call Package solution in for each loop

  • Hi Mohammad
    Here's a zip of the solution.
    The package that you call is in the same solution as the package containing the script task with all the code as per the article.

  • I posted a very similar solution over a year ago on http://www.mssqltips.com here. Not accusing you of anything but thought my solution would add some value.  I have some performance data that really shows how powerful this method can be.  Enjoy!  Keith

  • kgresham - Tuesday, November 7, 2017 12:08 PM

    I posted a very similar solution over a year ago on http://www.mssqltips.com here. Not accusing you of anything but thought my solution would add some value.  I have some performance data that really shows how powerful this method can be.  Enjoy!  Keith

    Hi Keith, thanks for the feedback.
    Had a look at your article and your performance data is a great example of the gains that can be made.
    This was something that I started playing around with a while back and decided to revisit again recently, but in case I maybe came across your article in the past and it seeded the idea in my head, then I must apologize for not making any reference to it.
    Paulo

  • Great article Paulo.
    I follow the steps but I keep getting this error  'The object in the variable "User::FileList" does not contain an enumerator'
    Any reason for this error?

  • Thanks Jowee
    I think you might be trying to execute "TestPackage" by itself.
    Make sure you set "ParallelFileProcessor" as the startup object for the solution.
    It will pass the FileList object to the "TestPackage" package.

  • Awesome!!! Could have used this several months ago 🙂

  • @amartinez
    It's never too late to go back and rework existing processes 🙂

  • Great article.  Thank you for sharing your approach.
    Can you explain the acrobatics done here.  I don't understand the reason for using tempList variable

    foreach (string file in Directory.GetFiles(sourceDirectory, fileMask))     
    {      
    fileCount += 1;      
    fileGroup = fileCount % noOfPackageInstances;      
    List<string> tempList = new List<string>();      
    if (allFiles[fileGroup] != null)       {        tempList = allFiles[fileGroup];       }      
    tempList.Add(file);      
    allFiles[fileGroup] = tempList;     
    }

  • Thanks Misha
    Like I mentioned in the article, my code is probably not the most elegant 🙂
    The reason I'm using it is because I can't add to a null array element directly.
    There must be a better way of doing this, so if you have any suggestions then please post them here.

  • We do something similar to import 160 text files (and we get these every day).  They are pipe delimited but have a .CSV extension.  Each file has a different layout but is consistent every day (so file 1 is different to file 2 but file 1 will have the same layout every day).

    We create an XML file with the Bulk Import details needed in there (the XML file is created using a script take.

     

    Our process looks like:-

    Load a Manifest file (this contains a list of all the files that have been downloaded)

    There are 3 extra columns in here:-

    IsDownloaded (we set that to 1 if it's a file ready for import)

    IsLoading (a file in the midle of being processed)

    IsLoaded (a file that has been processed).  This also leaves the IsLoading set to 1

     

    We then load the list of files into an object in SSIS (just doing a  "SELECT FileName FROM Manifest WHERE IsDownloaded = 1)

    We then loop through the object and collect the next file where IsDownloaded = 1 and IsLoading = 0

    We then call the load process once for each file (it's the same SSIS package called up to 32 times with a paramater that specifies a filename).

     

    We can load all 160 files (around 500MB) in about 5 minutes.

     

    The loop just carries on calling the actual load package untill all the files are loaded.

  • If you really have massive numbers of files, I'd recommend going with CozyRoc's Paralell Execution Task step instead.

    https://www.cozyroc.com/ssis/parallel-loop-task

    Instead of having to spawn multiple independent tasks that all run separately in SSIS (which I've done in the past) and can individually fail and just cause a mess with management.  You define a disabled portion of your SSIS package as the part you want to run in parallel and tell it how many times you want it to run in parallel, and it does all the work.  My craziest package processes close to 30,000 files in a given day and does it where I don't have to manage up to 30,000 spawned SSIS package executions.  Well worth the couple hundred bucks it takes to license the components.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply