Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Tim Mitchell

Tim Mitchell is a business intelligence consultant, author, trainer, and SQL Server MVP with over a decade of experience. Tim is the principal of Tyleris Data Solutions and is a Linchpin People teammate. Tim has spoken at international, regional, and local venues including the SQL PASS Summit, SQLBits, SQL Connections, SQL Saturday events, and various user groups and webcasts. He is a board member at the North Texas SQL Server User Group in the Dallas area. Tim is coauthor of the book SSIS Design Patterns, and is a contributing author on MVP Deep Dives 2. You can visit his website and blog at TimMitchell.net or follow him on Twitter at twitter.com/Tim_Mitchell.

SSIS: Conditional File Processing in a ForEach Loop

I’ve fielded a number of requests recently asking how to interrogate a file within SSIS and change the processing rules based on the metadata of said file.  A recent forum poster specifically asked about using the foreach loop to iterate through the files in a directory and, on a per-file basis, either process the file or skip the file if it was updated after a specific date.  I’ll use that most recent request to illustrate one method to solve this problem.

Ingredients

For this demonstration, our SSIS package will require the following:

  • A foreach loop to process each file in a given directory
  • A script task to interrogate each file and, based on the timestamp, mark it to be either processed or skipped.
  • Four SSIS variables:
    • @SourceDirectory (String) – stores the directory to loop through
    • @MinDateStamp (DateTime) – indicates the earliest date to process
    • @Filename (String) – stores the current filename for each cycle of the foreach loop
    • @ProcessFile (Boolean) – a flag to indicate whether the current file should be processed
  • A precedence constraint which will be configured to validate an expression and task outcome.
  • A data flow task to process the validated files.


Set Up the Loop

Nothing groundbreaking here: after adding the foreach loop to the control flow pane, set it to work as a Foreach File Enumerator, and use an expression to set the source directory to be derived from the value of the @SourceDirectory variable:

img2

 

Script Task

Since there is no native SSIS task designed to interrogate file metadata, we’re going to need to use a script task to do this.  After dropping a script task from the toolbox into the foreach loop container, we’ll edit the script to create a FileInfo object as a logical hook to the file.  After confirming that the file exists, we’ll compare its LastWriteTime property to the earliest acceptable cut-off date (defined by the @MinDateStamp variable value) to determine if the timestamp meets the criteria for processing.  Based on the results of that comparison, we will set the @ProcessFile value to either True or False.  You can see the resulting code logic in the snippet below:

 

public void Main()
        {
            // Create a logical file object 
            System.IO.FileInfo theFile = new System.IO.FileInfo(Dts.Variables["Filename"].Value.ToString());
 
            // If the update date on the file is greater than the date specified in the MinDateStamp
            //  variable, set the variable flag to process the file.    
            if (theFile.Exists 
                && theFile.LastWriteTime > DateTime.Parse(Dts.Variables["MinDateStamp"].Value.ToString()))
            {
                // MessageBox.Show("Processing file " + Dts.Variables["Filename"].Value.ToString());
                Dts.Variables["ProcessFile"].Value = true;
            }
            else
            {
                // MessageBox.Show("Skipping file " + Dts.Variables["Filename"].Value.ToString());
                Dts.Variables["ProcessFile"].Value = false;
            }
 
            Dts.TaskResult = (int)ScriptResults.Success;
        }

 

So with each iteration of the foreach loop, the @ProcessFile value will indicate whether the current file should be processed or skipped.  After adding a data flow task containing the necessary components to process the flat file, our next step would be to add a precedence constraint connecting the script task to the new data flow task.  This precedence constraint will be configured to use an expression and a constraint, and will confirm that the current file is to be processed by interrogating the value of the @ProcessFile variable.  If that value is true, then program flow continues to the data flow task; otherwise the loop starts again with the next file in turn.  The precedence constraint would be configured as such:

img3

 

After configuring all of the necessary tasks for this operation, the data flow pane should look similar to the following:

img1

Now, when the SSIS package is executed, the timestamp of each file in the specified directory will be checked, and only those that meet the date criteria will be processed in the data flow task.  Note that you could replace the timestamp in our example to some other file criteria; for example, you could check the file size, type, attributes, or other settings to determine if the file should be processed.

 

Conclusion

Although SSIS does not include a native component to conditionally process files, you can see from this example that a simple script can easily solve this ETL challenge.  You can download the sample package used in this example here.

Comments

Posted by Todd McDermid on 23 August 2010

It really does disappoint that there isn't an included task that will extract information about files in SSIS.

Good news is there are custom extensions that do.  Here's a list of (mostly open source) extensions for SSIS: http://ssisctc.codeplex.com - including one that retrieves a bunch of file properties, including the last write time: filepropertiestask.codeplex.com

Posted by Tim Mitchell on 23 August 2010

Todd, that's quite a comprehensive list of add-ins.  Thanks for pointing this out!

Posted by vinothlilly on 26 August 2010

hi

i want to create a ssis package for import xml data into database table, in that , i want to use for-loop for filter correct data to store correct table and wrong data to store wrong table. pls give a example with detail

Posted by Paulo Morgado on 26 August 2010

I'm getting the files I want to iterate before the Foreach loop and feeding them to the Foreach loop trough a Foreach Variable Enumerator.

Posted by dbowlin on 26 August 2010

Tim, nice article.  Script tasks are a weak area of mine in SSIS.  I am always glad to see some sample code.  This particular code looks very useful.

Posted by Christian Bahnsen on 26 August 2010

Tim, thanks.  I think this is almost exactly what I was asking you about at the SQL Saturday in Baton Rouge.  I hadn't had a chance yet to put it into effect.  This article will be very helpful when I can get back to that project.  Thanks again for your help.

Posted by Bob McC on 30 August 2010

I use a script very similar to this to process .zip files.  Sometimes what Microsoft calls a feature, I call a bug.  For reasons I won't go into here; you can't use a for-each-file loop in ssis to only pick up .zip files as it also picks up .zip* (any data following 'zip').  I had to add a script task to make sure I was only processing files that TRUELY ended with .zip.

Posted by kishorebabu.kotha on 29 January 2014

I am unable to donwload the source code of this solution. Please help. Urgent.

Posted by kishorebabu.kotha on 29 January 2014

Please send the source code to my e-mail: kishorebabu.kotha@gmail.com

Posted by Tim Mitchell on 2 February 2014

Hi there: You can download the files at this link: a.timmitchell.net/.../CheckFileTimestamp.zip

Leave a Comment

Please register or log in to leave a comment.