A common requirement in ETL situations is to extract data from a single source and conditionally branch it out to multiple outputs. SQL Server Integration Services offers the Conditional Split transformation for these scenarios: You break apart your data into different streams based on one or more custom criteria, and then send each stream of data to its own destination, whether it's a text file, database table, or other means of storage. The Conditional Split / multiple output methodology works well in most scenarios, but what if you have dozens or even hundreds of outputs? The overhead of managing all of those destinations in SSIS can be overwhelming.
For those situations, there may be an easier way. In this article, I'll demonstrate how the Foreach Loop Container can be used to assist with conditional splitting of a data stream bound for a large number of destinations. We'll review the fundamentals of this container, and I'll describe how yData Flow object within that loop to avoid having to explicitly create multiple outputs with identical metadata. Since we know that each of the output files will be structured identically, we can use one of the tenets of basic programming: reusability. With the help from a dynamic expression, I'll show you how to reuse the destination object to avoid the unnecessary creation and maintenance of multiple SSIS objects.
For this example, I will create output files containing a list of invoices, with each file containing a single vendor's invoices. The methodology will be as such: first, retrieve a list of vendors to be extracted as part of our ETL package, and store that list in an SSIS object variable. Next, create a Foreach Loop container, and set the enumerator to use that vendor list object variable as its control collection. Within the loop container, I will place a Data Flow Task, which will contain an OLE DB Source object and a Flat File Destination object.For each iteration of the loop, the OLE DB source will query the database to retrieve invoices for the current vendor in the list, and will use an expression to dynamically set the output filename based on the vendor number.
First, create and populate our source tables:
|CREATE TABLE Vendors (
VendorID INT PRIMARY KEY,
VendorName VARCHAR(50) NOT NULL,
Status CHAR(1) NOT NULL)
CREATE TABLE Invoices (
CREATE TABLE InvoiceStatus (
|Figure 1: DDL Statements|
Now, create a new SSIS package and add an Execute SQL Task to the control flow pane. After creating a connection to the source database, configure this task to use the SQL statement in Figure 2. This query will retrieve a list of vendors with invoices for this period, which will be used to control the flow within the loop.
|SELECT DISTINCT CAST(v.VendorID AS VARCHAR(20))
FROM Vendors v
INNER JOIN Invoices i ON v.VendorID = i.VendorID
|Figure 2: The SQL statement to retrieve the vendors|
After defining the SQL statement, there are a couple of other configurations to change on the SQL task. First, under the General tab, you'll see the Result Set selection is set to None by default, since you normally don't expect to run traditional data retrieval queries in the control flow pane. Since this task will be used to populate an SSIS object variable with the list of vendors, this should be changed to Full result set(Figure 3).Next, under the Result Set tab, create a variable in which the list of vendors will be stored. Click the Add button at the bottom of the window, and then configure your variable as type Object as shown in Figure 4 . Note: Be sure to set the Result Name for your new variable to the value of 0.
|Figure 3: Configure the result set|
|Figure 4: Create the object variable|
Up next, drag a for each loop container from the toolbox onto the control flow pane. Into that loop container, drop a single data flow object, and connect the execute SQL task from the previous step to the new loop container using a precedence constraint (Figure 5 ).
|Figure 5: The control flow pane, with precedence constraint|
The next task is to set up the foreach loop container to enumerate the list of vendors. In a nutshell, I'm sending a list of items to the foreach loop, and it will execute the loop once for each value in the list - in this case, one iteration for each vendor listed in the SSIS object variable. Open the foreach loop and browse to the Collection tab, and set the enumerator to the value Foreach ADO Enumerator. For the source variable, use the object variable created in the previous steps, and set the enumeration mode to Rows in the first table, since we're only addressing one list of data. See Figure 6 for the proper configuration of the Collection tab.
Now click on the Variable Mappings tab in the foreach loop. This window allows us to specify a particular value within a list that will be used as a query parameter, and in this case, the vendor ID will be used to execute a query against the source database to retrieve one vendor's results at a time. Use the drop down list in the Variable column to define a new variable as shown in Figure 4, but this time select the String data type. Set the Index value to 0, as there is only a single value for each row in our list (Figure 7 ).
|Figure 6: Configuring the collection in the foreach loop|
|Figure 7: Variable mapping in the foreach loop|
That's it for the configuration of the control flow pane; it's time to dive into the data flow and configure the source and destination objects. To start off, change to the Data Flow tab, and drag a single OLE DB source onto the data flow pane and open it for configuration. Use the connection object created with the earlier database query. For this source, the data access mode should be set to SQL command, and the following command will be used:
FROM Invoices i
INNER JOIN InvoiceStatus s
ON i.InvoiceStatus = s.StatusID
WHERE VendorID = ?
|Figure 8: DML statement for invoice retrieval|
You'll see that I've included a question mark in my WHERE clause; this is a placeholder that indicates I'll be using a parameter at runtime. To configure the parameter name, click on the Parameters button to the right of the query window. Next to Parameter 0 (the default name for the first parameter), select the name of the second variable you created earlier (the one typed as a String) as shown in Figure 9 .This will identify the parameter to pass into the SQL statement, which in this example is the vendor ID. Note: Multiple round trips to the server can be expensive, especially if the data set is large or the query is complex. As always, you should test any solution for performance before deploying to production!
|Figure 9: Variable mapping in the OLE DB source|
Next, create a new variable to store the output folder location (Figure 10), which will be used as part of the dynamic filename for each output file. It is important to note that you could dynamically specify a different folder for each output file, but for the sake of simplicity in this demonstration, I'll write all of the output files to the same directory.
|Figure 10: Create a new variable for the output folder location|
The next step is to create and configure the flat file destination and its corresponding file connection. Drag a flat file destination object from the toolbox onto the data flow pane, and connect the output of the OLE DB source to this destination. Open the flat file destination and you'll be required to create a new file connection. Set up the connection in your preferred output format (delimited, fixed width, etc.); you'll have to specify a temporary filename for now, but the package will set the output filename at runtime.
With all of the components created, we're left with creating the expression that will build the output filenames. For this example, I want to name the file with the vendor ID and the date ("VENDOR[#]" followed by the date in the format YYYYMMDD), such that my output file for vendor 4 for New Year's Day would be VENDOR4_20100101.txt. A little help from the Expression Language makes this relatively simple to do; in the properties window of the new file connection, open the Expressions window and use the ellipsis button to drill into the ConnectionStringvalue (Figure 11). The expression syntax may be a bit unfamiliar, but you can see where I've made use of the RIGHT and DATEPART functions as well as the concatenation operator (+), which are almost identical to the T-SQL components of the same name. For this example, the expression syntax would be as shown in Figure 12.
|Figure 11: Expression Language editor|
+ "VENDOR" + @[User::TheVendor] + "_"
+ RIGHT("0000" + (DT_STR, 4, 1252)DATEPART("YEAR", GETDATE()), 4)
+ RIGHT("00" + ((DT_STR, 2, 1252)DATEPART("MONTH", GETDATE())), 2)
+ RIGHT("00" + ((DT_STR, 2, 1252)DATEPART("DAY", GETDATE())), 2)
|Figure 12: Expression language for creating dynamic filename|
With the package configuration completed, you can execute the package and watch the data flow to multiple outputs. You'll see the data flow object flash from yellow to green and back several times as it enumerates each of the vendor's invoices and writes the data to the various output files.
In this article, I've demonstrated how to use SQL Server Integration Services to create a fully dynamic output to multiple destinations. Using the foreach loop container and a little bit of expression language, I've shown how to avoid creating identical output objects by reusing destination components, which can help to eliminate unnecessary effort and help to simplify otherwise complex SSIS packages.