The SSIS Object Variable and Multiple Result Sets

In my most recent post in this series, I talked about how to use the SSIS object variable as an ADO recordset as a source in a data flow. By loading the result set of a query into this variable, the contents of the variable can be read by an SSIS script component and sent out through the SSIS pipeline.

In working up the example in the previous blog post, I wondered to myself: What happens if the source query returns more than one result set?

Multiple Result Sets with the Execute SQL Task

In reality, data retrieval queries are typically configured to return exactly one result set. If two different result sets are needed, two separate queries are built, each with its own source-to-target flow of data. However, I’ve seen more than a few queries (typically in the form of vendor stored procedures, which, of course, cannot be edited) which return more than one set of results in a single execution. Even some of the built-in SQL Server stored procedures do this (sp_help comes to mind). From wherever they come, any ETL process that consumes data generated by these types of queries or stored procedures must be configured to handle multiple result sets.

Thinking through the previous example, I wanted to experiment with what would happen if the query in an execute SQL task returns more than one result set. To test this, I created a stored procedure that purposefully returns two different result sets, as shown below:

Predictably, the stored procedure above returns two sets of results.

To continue the test, I’ve added an execute SQL task to the control flow, using the stored procedure above as the query. I set the ResultSet to Full Result Set.

Specifying a ResultSet value of Full Result Set requires the use of an object-typed variable to store the results.

Next, I’ll use a script component, configured as a source, to programmatically extract the information from the ADO recordset object. In my last post, I used an OleDbDataAdapter to fill a DataTable with the data from the object variable. In this case, I’ve changed that just a bit, instead populating a DataSet object which should allow us to capture more than one result set (since the DataSet can contain one or more DataTables). I’ve also added a second output, setting up the appropriate columns for the added result set. The snippet below shows the example of looping through each result set to build the data for each output. (Teaser: Before you try to execute this code, don’t execute this code. It won’t work. I’ll explain why below.)

// Set up the DataAdapter to extract the data, and the DataSet object to capture both output tables
OleDbDataAdapter da = new OleDbDataAdapter();
DataSet ds = new DataSet();

// Extract the data from the object variable into the DataSet
da.Fill(ds, Variables.vResults, "myTable");

// Process the first result set
foreach (DataRow dr in ds.Tables[0].Rows)
{
    // Create a new, empty row in the output buffer
    SalesOutputBuffer.AddRow();

    // Now populate the columns
    SalesOutputBuffer.SalesOrderID = int.Parse(dr["SalesOrderID"].ToString());
    SalesOutputBuffer.RevisionNumber = int.Parse(dr["RevisionNumber"].ToString());
    SalesOutputBuffer.OrderDate = DateTime.Parse(dr["OrderDate"].ToString());
    SalesOutputBuffer.ShipDate = DateTime.Parse(dr["ShipDate"].ToString());
    SalesOutputBuffer.Status = int.Parse(dr["Status"].ToString());
    SalesOutputBuffer.TotalDue = decimal.Parse(dr["TotalDue"].ToString());
}        

// Process the second result set
foreach (DataRow dr in ds.Tables[1].Rows)
{
    // Create a new, empty row in the output buffer
    SalesDetailOutputBuffer.AddRow();

    // Now populate the columns
    SalesDetailOutputBuffer.SalesOrderID = int.Parse(dr["SalesOrderID"].ToString());
    SalesDetailOutputBuffer.SalesOrderDetailID = int.Parse(dr["SalesOrderDetailID"].ToString());
    SalesDetailOutputBuffer.OrderQty = short.Parse(dr["OrderQty"].ToString());
    SalesDetailOutputBuffer.LineTotal = decimal.Parse(dr["LineTotal"].ToString());

}

When I update the script component source with this code and execute the package, it fails with the following error:

This scripting error is a bit cryptic, but it tells me all I need to know: “Cannot find table 1”. The second code block, which attempts to iterate over ds.Tables[1].Rows, fails because there is no table at index 1 (the second table, or what should be the second result set). I can infer from this that the execute SQL task has loaded only the first result set into the SSIS object variable, even though the stored procedure returned two different result sets. If I comment out the section that attempts to process the second table, the package runs fine and loads just the data from the first result set.

So if the execute SQL task does work for this scenario, how exactly does one retrieve data from a stored procedure or query which produces multiple result sets? The easiest method, as it turns out, is to use the script component to connect directly to the database and execute the query.

Handling Multiple Result Sets in the Script Component

All other things being equal, the simplest approach is usually the best. In this case, I can eliminate the middleman entirely (the execute SQL task) and connect directly to the source database from within the script component, retrieving the necessary data from all of the result sets. In testing this solution, I came across an excellent article written some years back by Bob Pearson, in which he describes and demonstrates (in VB.NET) the solution to the multiple result set quandary. With Bob’s permission, I have adapted the design pattern he described to C#.

In this pattern, the entire source operation – querying the source database and splitting out the result sets into their appropriate outputs – is handled in the script component. I should point out that this example uses the ADO.NET connection manager rather than the OleDB connection manager. Although it is technically possible to access the OleDB connection manager in script, the syntax to do so is a bit quirky, so I’ll keep it simple and use ADO.NET.

#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
#endregion

// Add in the appropriate namespaces
using System.Data;
using System.Data.SqlClient;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
    // Use the existing package connection manager
        IDTSConnectionManager100 connMgr = Connections.AdventureWorksSource;
        SqlConnection conn = (SqlConnection)connMgr.AcquireConnection(null);

    // Execute the stored procedure that will generate the two result sets
        SqlCommand cmd = new SqlCommand("EXEC [dbo].[usp_GetSalesData]", conn);
        SqlDataReader reader = cmd.ExecuteReader();

        while (reader.Read()) // Read() will return false once we've processed all rows in the first result set.
        {
            // Create a new, empty row in the output buffer
            SalesOutputBuffer.AddRow();

            // Now populate the columns
            SalesOutputBuffer.SalesOrderID = int.Parse(reader["SalesOrderID"].ToString());
            SalesOutputBuffer.RevisionNumber = int.Parse(reader["RevisionNumber"].ToString());
            SalesOutputBuffer.OrderDate = DateTime.Parse(reader["OrderDate"].ToString());
            SalesOutputBuffer.ShipDate = DateTime.Parse(reader["ShipDate"].ToString());
            SalesOutputBuffer.Status = int.Parse(reader["Status"].ToString());
            SalesOutputBuffer.TotalDue = decimal.Parse(reader["TotalDue"].ToString());
        }

        // The NextResult() function moves us to the next result set
        reader.NextResult();

        // Process the second result set
        while (reader.Read())
        {
            // Create a new, empty row in the output buffer
            SalesDetailOutputBuffer.AddRow();

            // Now populate the columns
            SalesDetailOutputBuffer.SalesOrderID = int.Parse(reader["SalesOrderID"].ToString());
            SalesDetailOutputBuffer.SalesOrderDetailID = int.Parse(reader["SalesOrderDetailID"].ToString());
            SalesDetailOutputBuffer.OrderQty = short.Parse(reader["OrderQty"].ToString());
            SalesDetailOutputBuffer.LineTotal = decimal.Parse(reader["LineTotal"].ToString());
        }
        
    }
}

After making this change to bypass the SSIS object variable method and simply query the database directly from the script component source, the script component source runs successfully. I can then send the two outputs, each containing a distinct result set from the source stored procedure, to their respective output tables.

Tl;dr: In the rare cases when you have to process a query with multiple result sets, skip the SSIS variable and use a direct connection through the script component source.

Thanks again to Bob Pearson for allowing me to adapt his VB.NET code into C# for this example.

The post The SSIS Object Variable and Multiple Result Sets appeared first on Tim Mitchell.

The SSIS Object Variable and Multiple Result Sets

Multiple Result Sets with the Execute SQL Task

Handling Multiple Result Sets in the Script Component

Rate

Share

Share

Rate

The SSIS Object Variable and Multiple Result Sets

Multiple Result Sets with the Execute SQL Task

Handling Multiple Result Sets in the Script Component

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts