SSIS: Conditional File Processing in a ForEach Loop

, 2010-08-23

I’ve fielded a number of requests recently asking how to interrogate a file within SSIS and change the processing rules based on the metadata of said file.  A recent forum poster specifically asked about using the foreach loop to iterate through the files in a directory and, on a per-file basis, either process the file or skip the file if it was updated after a specific date.  I’ll use that most recent request to illustrate one method to solve this problem.

Ingredients

For this demonstration, our SSIS package will require the following:

  • A foreach loop to process each file in a given directory
  • A script task to interrogate each file and, based on the timestamp, mark it to be either processed or skipped.
  • Four SSIS variables:
    • @SourceDirectory (String) – stores the directory to loop through
    • @MinDateStamp (DateTime) – indicates the earliest date to process
    • @Filename (String) – stores the current filename for each cycle of the foreach loop
    • @ProcessFile (Boolean) – a flag to indicate whether the current file should be processed
  • A precedence constraint which will be configured to validate an expression and task outcome.
  • A data flow task to process the validated files.


Set Up the Loop

Nothing groundbreaking here: after adding the foreach loop to the control flow pane, set it to work as a Foreach File Enumerator, and use an expression to set the source directory to be derived from the value of the @SourceDirectory variable:

img2

 

Script Task

Since there is no native SSIS task designed to interrogate file metadata, we’re going to need to use a script task to do this.  After dropping a script task from the toolbox into the foreach loop container, we’ll edit the script to create a FileInfo object as a logical hook to the file.  After confirming that the file exists, we’ll compare its LastWriteTime property to the earliest acceptable cut-off date (defined by the @MinDateStamp variable value) to determine if the timestamp meets the criteria for processing.  Based on the results of that comparison, we will set the @ProcessFile value to either True or False.  You can see the resulting code logic in the snippet below:

 

public void Main()
        {
            // Create a logical file object 
            System.IO.FileInfo theFile = new System.IO.FileInfo(Dts.Variables["Filename"].Value.ToString());
 
            // If the update date on the file is greater than the date specified in the MinDateStamp
            //  variable, set the variable flag to process the file.    
            if (theFile.Exists 
                && theFile.LastWriteTime > DateTime.Parse(Dts.Variables["MinDateStamp"].Value.ToString()))
            {
                // MessageBox.Show("Processing file " + Dts.Variables["Filename"].Value.ToString());
                Dts.Variables["ProcessFile"].Value = true;
            }
            else
            {
                // MessageBox.Show("Skipping file " + Dts.Variables["Filename"].Value.ToString());
                Dts.Variables["ProcessFile"].Value = false;
            }
 
            Dts.TaskResult = (int)ScriptResults.Success;
        }

 

So with each iteration of the foreach loop, the @ProcessFile value will indicate whether the current file should be processed or skipped.  After adding a data flow task containing the necessary components to process the flat file, our next step would be to add a precedence constraint connecting the script task to the new data flow task.  This precedence constraint will be configured to use an expression and a constraint, and will confirm that the current file is to be processed by interrogating the value of the @ProcessFile variable.  If that value is true, then program flow continues to the data flow task; otherwise the loop starts again with the next file in turn.  The precedence constraint would be configured as such:

img3

 

After configuring all of the necessary tasks for this operation, the data flow pane should look similar to the following:

img1

Now, when the SSIS package is executed, the timestamp of each file in the specified directory will be checked, and only those that meet the date criteria will be processed in the data flow task.  Note that you could replace the timestamp in our example to some other file criteria; for example, you could check the file size, type, attributes, or other settings to determine if the file should be processed.

 

Conclusion

Although SSIS does not include a native component to conditionally process files, you can see from this example that a simple script can easily solve this ETL challenge.  You can download the sample package used in this example here.

Rate

Share

Share

Rate

Related content

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

2009-02-23

1,567 reads

Networking - Part 4

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

2009-02-13

360 reads