Blog Post

Using the SSIS Object Variable as a Data Flow Source

,

Object variables in SSIS are incredibly versatile, allowing the storage of almost any type of data (even .NET objects). In my last post on this topic, I demonstrated how an SSIS object variable containing a .NET DataSet object could be used by the for each loop container as an iterator. In this post, I’ll continue the discussion by showing how SSIS object-typed variables can also be used in the data flow by the script component to generate source data.

The last post described how to use the execute SQL task to execute a query that would generate a result set. As a refresher, the ResultSet setting specifies that the results of the query should be written to a variable.

image

The object variable intended to store the result set of this query is specified in the Result Set tab. Note that this must be an object-typed variable, or the operation will fail.

image

When the package executes, the SSIS variable will be set to an ADO recordset object which contains the results of the query. From here, the variable can be consumed via script – by either the script component or the script task – in SSIS. In this example, I’ll show how to use it on the SSIS data flow through a script component configured as a source.

To read the data from this object variable, I’ve created a data flow, added a script component source, and set the script component to use the variable (User::vResults) as a read-only variable. I have also configured the default output with the expected column metadata from this query, with the expected column names and data types set up in the Inputs and Outputs tab.

image

In the script itself, the SSIS variable containing the ADO recordset can be read by the System.Data.OleDb.OleDbDataAdapter, and then written to a System.Data.DataTable object. It’s relatively easy to extract the results from there, since a DataTable generally has a rows-and-columns structure that is easily navigable. As shown below, the rows from the extracted DataTable object are then, one by one, written to the output buffer of the script component, which will send them down the data flow pipeline and eventually into a relational database.

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
#endregion
// Add in the appropriate namespaces
using System.Data;
using System.Data.OleDb;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        // Set up the DataAdapter to extract the data, and the DataTable object to capture those results
        OleDbDataAdapter da = new OleDbDataAdapter();
        DataTable dt = new DataTable();
        
        // Extract the data from the object variable into the table
        da.Fill(dt, Variables.vResults);
        // Since we know the column metadata at design time, we simply need to iterate over each row in
        //  the DataTable, creating a new row in our Data Flow buffer for each
        foreach (DataRow dr in dt.Rows)
        {
            // Create a new, empty row in the output buffer
            SalesOutputBuffer.AddRow();
            // Now populate the columns
            SalesOutputBuffer.SalesOrderID = int.Parse(dr["SalesOrderID"].ToString());
            SalesOutputBuffer.RevisionNumber = int.Parse(dr["RevisionNumber"].ToString());
            SalesOutputBuffer.OrderDate = DateTime.Parse(dr["OrderDate"].ToString());
            SalesOutputBuffer.ShipDate = DateTime.Parse(dr["ShipDate"].ToString());
            SalesOutputBuffer.Status = int.Parse(dr["Status"].ToString());
            SalesOutputBuffer.TotalDue = decimal.Parse(dr["TotalDue"].ToString());
        } 
    }
}

To be clear, this script would have to be configured with all of the column-level metadata defined at design time. SSIS is very picky about metadata in the data flow, which must be defined at design time and not inferred at runtime.

After adding a destination component to write this data to a table, I can test the package and confirm that the results sent to that table are as expected. The data flow visualization shows the 20 rows originally written to the object variable, and the data viewer confirms that the values have been properly extracted from the object variable and formatted for the data flow.

image

On its own, this example would be impractical. After all, the only thing we’ve done here is taken a result set from a relational database and run it through an object-typed variable, just to extract it and write it out to another database. However, there are cases where using SSIS object variables in a design pattern similar to this would be very helpful:

  • Processing data that requires extensive reshaping or transformation, where the data cannot be transformed in the source (such as data delivered from static stored procedures that cannot be modified)
  • Complex processing not ideally suited for the native SSIS components, such as rolling aggregates or heavy string manipulation
  • Processing the output of stored procedures that return more than one result set

I’m going to spend time in my next post going over the last bullet in that list, to demonstrate how this method behaves with multiple result sets from a single query.

In this post, I’ve demonstrated how to use an object-typed variable containing an ADO recordset to provide source data for the SSIS data flow. Although this is not something the average SSIS developer will use on a daily basis, it’s a very handy design pattern to have in one’s tool belt.

The post Using the SSIS Object Variable as a Data Flow Source appeared first on Tim Mitchell.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating