SQLServerCentral Article

Using XML from data variable option in SSIS

,

What if you wanted to consume XML data in SSIS, but the data wasn't stored in a file? Enter XML from data variable. Using this property, XML in memory can be used just as easily as if it was written to a file. This can be useful if you are loading data from an API on the web from an HTTP GET request.

Limitation

I'm going to load my XML into a string variable. This is a limitation of the XML Source item. It only accepts string variables when using the XML from data variable option. String variables max out at 2GB, so if an XML doc is bigger than that, this approach won't work.

Before You Start

This example loads data into a table. Create the table by running the SQL below.

IF (NOT EXISTS (SELECT * 
                 FROM INFORMATION_SCHEMA.TABLES
                 WHERE TABLE_SCHEMA = 'dbo' 
                 AND TABLE_NAME = 'countries')
                )
CREATE TABLE dbo.countries
(
 id nvarchar(255) NULL,
 name nvarchar(255) NULL,
 capitalCity nvarchar(255) NULL
);

Let's Do It 

This example will walk you though making the HTTP request, putting the XML document into a variable then using that data to populate a SQL Server table. We are going to use an open datasouce published by the World Bank.

First open Visual Studio Shell 2010 (or SQL Server Data Tools) as an administrator.  Click New Project. In the Installed Templates pane, locate Business Intelligence and select Integration Services. Double-click Integration Services Project. (see the image below).

Now create 2 variables. One will hold the address of the API being called. The other holds the XML document.

  • variable 1

    • name: xmlDoc
    • type: string
    • value: ""
  • variable 2

    • name: URI
    • type string
    • value: http://api.worldbank.org/countries

Notice that xmlDoc is given an initial value of "". This variable must be initialized with a value or the package won't build. Your Variables Window should look like the image below.

Now let's set up the script that will get the XML document and put it in a variable. From the SSIS Toolbox, drag-and-drop a Script Task onto the Control Flow tab. Double-click it to open the Script Task Editor.

Set ReadOnlyVariables to the URI variable and the ReadWriteVariables to xmlDoc as shown in the image below.

Click the Edit Button. Copy and Past the C# code below over what is presented to you. This is the code that makes the HTTP request and stores the XML in a variable. See the comments in the code.

#region Namespaces
using System;
using System.Net; //Added
using Microsoft.SqlServer.Dts.Runtime; //Added
#endregion
 
namespace ST_3831315ea9bb45e9b1c2f2e9addad540
{
 
 [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
 public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
 {
        public void Main()
        {
            // Declare variable that will hold the xml document received by the API
            string xmlDoc = String.Empty;
 
            try
            {
                // Get the URI from the variable
                string url = Dts.Variables["User::URI"].Value.ToString();
 
                //Create a Web Client
                using (WebClient client = new WebClient())
                {
                    //Download the xml document as a string
                    xmlDoc = client.DownloadString(url);
                }
 
                // Set the value of the xmlDocument to the string that was just downloaded
                Dts.Variables["User::xmlDoc"].Value = xmlDoc;
 
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch
            {
                 Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
 
        
        #region ScriptResults declaration
        /// 
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// 
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
 
 }
}

Look at the image below. This is what a new script task will look like right after copying in the code above. Notice that the name of the assembly in the code doesn't match the name of the assembly in your project. Change what is beside "namespace"  in the code to match what is in your Solution Explorer window. In my case, the code needs to be changed to:  namespace ST_62986592e036411c93aa56fbb337c296

After you've change the assembly name, click Build on the menu. You will get a drop-down with the project name for your script task. Click it to trigger the build. After the build finishes, go to the menu bar, click File then click Exit. Click OK on the Script Task Editor to bring you back to the Data Flow Tab.

Drag and drop a Data Flow Task on to the Control Flow tab. Select the Script Task. Drag the green arrow from the Script Task to the Data Flow Task. Your Control Flow table should end up looking like the screenshot below.

Before you continue, you are going to need an XSD file to define the XML coming in from the API. Before you start the next set of instructions, download the XSD file (if that link doesn't work, try this one). 

Double click the Data Flow Task. This will take you to the Data Flow tab. Drag-and-drop an XML Source Task from the SSIS Toolbox on to the Data Flow tab. Your Data Flow tab should look like the image below. 

 Double click the XML Source Task. This will bring up the XML Source Editor, as shown below.

Set the following values:

  • Data access mode: XML data from variable
  • Variable name: User::xmlDoc

Here's what it looks like:

Click the Browse button beside XSD Location text box. Locate the XSD document you downloaded above and click Open. The XSD Location field should now be populated as shown in the image below.

Click OK at the bottom of the XML Source Editor. This will bring up an ugly list of warnings as shown below. Ignore them. Click OK to dimiss the warnings.

Now let's load the XML data into SQL Server. Drag-and-drop a SQL Server Destination on to the Data Flow tab. Select the XML Source Task. Take the blue arrow and drag it over the SQL Server Destination. This will display the Input Output Selection window because the XML Source contains several outputs. Select country from the Output drop-down as shown in the screenshot below, then and click OK.

Double-click the SQL Server Destination. This opens the SQL Destination Editor. Under Connection Manager click New and select the server you want to connect to. The Use a table or view drop-down populates. Select the countries table you created at the beginning of this article.

Click Mappings. You can see that the values in the XML document have been mapped to the columns in the table. Click OK.

Now run the package. Go to the countries table and verify that the data from the WorldBank API was loaded into your table.

Summary

You've done it! You have used the XML from data variable option of the XML Source item. With this approach, you were able to load the XML document into memory, and write it directly to a SQL Server table, without ever having to save it to a file.

Happy coding!

Resources

Rate

4.31 (13)

You rated this post out of 5. Change rating

Share

Share

Rate

4.31 (13)

You rated this post out of 5. Change rating