SQLServerCentral Article

Basics of XML and SQL Server, Part 2: Shredding XML

,

The first step in shredding an XML document and inserting its data into a SQL Server table is getting the document into the most easily-processed form. The most easily-processed form of XML document is one in which all of its data is encapsulated inside elements and none of its data is in the form of attributes.

With that in mind, we are going to create an XML document in which all of the data is in attributes and convert it to one in which all of the data is in elements.

Create an attributes-only XML document

Attached is an archive file named "xml_demo_files2.zip" containing three text files.

  1. The file named "StockMarketDataAsAttributes.xml" is an XML file containing all of its data as attributes.
  2. The file named "StockMarketDataAsElements.xml" is an XML file containing all of its data as elements.

  3. The file named "sp500hst.txt" is the source data originally used to generate the data in the previous two files.

Extract these three files into a folder named "C:\xml\" or some other convenient location.

The following SQL script creates a table named "sp500hst," inserts data from the "sp500hst.txt" file into the table, then executes a "FOR XML AUTO, ROOT()" SQL statement against the table to generate an XML stream with all its data in the form of attributes.

/*
     Change 'C:\xml\' to the folder containing the 'sp500hst.txt' demo file.
     If you get a 'file not found' error, it probably means your are referencing
     a database on a remote server. To run the script you need to copy the file
     to a shared folder on the remote server and reference it with the fully-
     qualified UNC path, ie. '\\ComputerName\SharedFolder\sp500hst.txt.'
*/BEGIN TRY
    DROP TABLE [dbo].[sp500hst]
END TRY
BEGIN CATCH
END CATCH
GO
CREATE TABLE [dbo].[sp500hst] (
    [Date] [VARCHAR](10) NULL,
    [Ticker] [VARCHAR](10) NULL,
    [Open] [VARCHAR](10) NULL,
    [High] [VARCHAR](10) NULL,
    [Low] [VARCHAR](10) NULL,
    [Close] [VARCHAR](10) NULL,
    [Volume] [VARCHAR](10) NULL
) ON [PRIMARY]
GO
BULK INSERT [dbo].[sp500hst] FROM 'C:\xml\sp500hst.txt' WITH (FIELDTERMINATOR = ',')
GO
SELECT DISTINCT * FROM [dbo].[sp500hst] AS StockData WHERE [Date] = '20090821' ORDER BY [Ticker]
FOR XML AUTO,ROOT('StockMarketData')
GO

The FROM [dbo].[sp500hst] AS StockData clause results in each parent element having the name 'StockData.'

The ROOT('StockMarketData') clause results in the root element having the name 'StockMarketData.'

Execute the SQL script from SQL Server Management Studio as shown below.

Click on the <StockMarketData> XML link to display the contents of the XML field. It should look like the image below.

As you can see, all the data contained in the XML tags is in form of attributes of the 'StockData' elements.

The demo file 'StockMarketDataAsAttributes.xml' was created from identical data, but if you wish you can copy-and-paste this XML output into Notepad and save it over the demo file. It would look like the image below in Notepad.

Transform the XML document's attributes into elements

The following XSLT stylesheet, courtesy of O'Reilly's CD bookshelfs, will convert any attributes in an XML document processed against it into elements.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml"/>
  <xsl:template match="*">
    <xsl:element name="{name()}">
      <xsl:for-each select="@*">
        <xsl:element name="{name()}">
          <xsl:value-of select="."/>
        </xsl:element>
      </xsl:for-each>
      <xsl:apply-templates select="*|text()"/>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

We will use shell tools online's XSLT PROCESSOR to transform the attributes-only XML we just generated.

Paste the above XSLT code into the input 'xslt' text box and the attribute-only XML output of the previous SQL script into the input 'xml' text box, then click the 'Submit Query' button, as shown below.

The 'output' text box above contains the transformed XML. All the data is now in elements. It isn't obvious because the XML is not coherently formatted. We will use shell tools online's XML FORMATTER to tidy up the converted XML.

Copy and paste the contents of the XML PROCESSOR's 'output' text box to the input 'xml' text box of the XML FORMATTER and click the 'Submit Query' button.

As you can see from the contents of the 'output' text box of the XML FORMATTER, we have successfully converted all the attributes in the original XML document to elements.

The demo file 'StockMarketDataAsElements.xml' was created from identical data, but if you wish you can copy-and-paste this XML output into Notepad and save it over the demo file.

How to load an XML document into an XML variable

The OpenXML method of shredding an XML document requires that the entire XML document be loaded into system memory.

The first step in doing this is accomplished by loading the document into a SQL variable of XML data type using the OPENROWSET(BULK) method, as demonstrated in the following SQL script.

/*
     Change 'C:\xml\' to the folder containing 'StockMarketDataAsElements.xml.'
     If you get a 'file not found' error, it probably means your are referencing
     a database on a remote server. To run the script, copy the file to a shared 
     folder on the remote server and reference it with the fully-qualified UNC 
     path, ie. '\\ComputerName\SharedFolder\StockMarketDataAsElements.xml.'
*/DECLARE @Doc AS XML
SET @Doc = (SELECT * FROM OPENROWSET(BULK 'C:\xml\StockMarketDataAsElements.xml',SINGLE_BLOB) AS x)
SELECT @Doc AS XmlDoc

Execute this script from SQL Server Management Studio, as shown below.

Now click on the <StockMarketData> XmlDoc link to get the data in a separate tab as in the image below.

This demonstrates that the XML document 'StockMarketDataAsElements.xml' was successfully stored in the XML data type variable.

Now that we know how to get an XML document into an XML variable, we are ready to shred the elements-only XML document with OpenXML and insert its data into a table.

Shredding XML with OpenXML

OpenXML uses XPath, a specialized expression language that parses through XML nodes to selectively retrieve data from XML documents. When using the OpenXML shredding method, the sp_xml_prepareDocument stored procedure is used to create an instance of an XML document in system memory from a copy of the document stored in an XML-type variable and return an integer handle that is used to address the XML object.

EXEC sp_xml_prepareDocument @hdoc OUTPUT, @Doc

In the above sp_xml_prepareDocument SQL statement, @Doc is the XML-type varialble holding the XML document,  while @hdoc is the handle of the instantianted XML-object document.

Following is the entire SQL script using this statement.

/*
     Change 'C:\xml\' to the folder containing 'StockMarketDataAsElements.xml.'
     If you get a 'file not found' error, it probably means your are referencing
     a database on a remote server. To run the script, copy the file to a shared 
     folder on the remote server and reference it with the fully-qualified UNC 
     path, ie. '\\ComputerName\SharedFolder\StockMarketDataAsElements.xml.'
*/DECLARE @Doc XML
DECLARE @hdoc INT
DECLARE @iDoc INT
SET @Doc = (SELECT * FROM OPENROWSET(BULK 'C:\xml\StockMarketDataAsElements.xml',SINGLE_BLOB) AS x)
EXEC sp_xml_prepareDocument @hdoc OUTPUT, @Doc
SET @iDoc = 1
BEGIN TRY
    DROP TABLE Stocks
END TRY
BEGIN CATCH
END CATCH
SELECT * INTO Stocks
FROM Openxml (@iDoc,'//StockMarketData/StockData ')
WITH
(
    [Date] VARCHAR(MAX) 'Date',
    [Ticker] VARCHAR(MAX) 'Ticker',
    [Open] VARCHAR(MAX) 'Open',
    [High] VARCHAR(MAX) 'High',
    [Low] VARCHAR(MAX) 'Low',
    [Close] VARCHAR(MAX) 'Close',
    [Volume] VARCHAR(MAX) 'Volume'
)
EXEC sp_xml_removeDocument @iDoc
SELECT * FROM Stocks

Execute the SQL script to shred the 'StockMarketDataElementsOnly.xml' document with OpenXML and insert its data into the 'Stocks' table. The script is shown below in SSMS.

The 'SELECT * FROM Stocks' SQL statement confirms that the contents of the 'StockMarketDataElementsOnly.xml' document were successfully inserted into the 'Stocks' table.

The previous script is as simple an example of shredding XML with OpenXML that you are ever likely to see.  Study it and the OpenXML syntax reference on MIcrosoft's site. Write a similar script of your own, then play with the examples in Using OPENXML.

Shredding XML with XQuery

Like OpenXML, XQuery uses XPath to navigate through elements and attributes in an XML document.

Shredding XML with XQuery does not require using sp_xml_prepareDocument to instantiate an XML object, meaning that the entire XML document does not have to be loaded into system memory.

/*
     Change 'C:\xml\' to the folder containing 'StockMarketDataAsElements.xml.'
     If you get a 'file not found' error, it probably means your are referencing
     a database on a remote server. To run the script, copy the file to a shared 
     folder on the remote server and reference it with the fully-qualified UNC 
     path, ie. '\\ComputerName\SharedFolder\StockMarketDataAsElements.xml.'
*/DECLARE @Doc XML
SET @Doc = (SELECT * FROM OPENROWSET(BULK 'C:\xml\StockMarketDataAsElements.xml',SINGLE_BLOB) AS x)
BEGIN TRY
    DROP TABLE Stocks
END TRY
BEGIN CATCH
END CATCH
SELECT * INTO Stocks FROM (
SELECT
    X.StockData.query('Date').value('.','VARCHAR(10)') AS 'Date',
    X.StockData.query('Ticker').value('.','VARCHAR(10)') AS 'Ticker',
    X.StockData.query('Open').value('.','VARCHAR(10)') AS 'Open',
    X.StockData.query('High').value('.','VARCHAR(10)') AS 'High',
    X.StockData.query('Low').value('.','VARCHAR(10)') AS 'Low',
    X.StockData.query('Close').value('.','VARCHAR(10)') AS 'Close',
    X.StockData.query('Volume').value('.','VARCHAR(10)') AS 'Volume'
FROM @Doc.nodes('StockMarketData/StockData') AS X(StockData)
) sub
SELECT * FROM Stocks ORDER BY Ticker

Execute the SQL script to shred the 'StockMarketDataElementsOnly.xml' document with XQuery and insert its data into the 'Stocks' table.

The 'SELECT * FROM Stocks' SQL statement confirms that the contents of the 'StockMarketDataElementsOnly.xml' document were successfully inserted into the Stocks table.

The previous script is as simple an example of shredding XML with XQuery that you are ever likely to see.

Study it and the XQuery Language Referene on Microsoft's site.

Write a similar script of your own, then play with the examples in XQuery Labs - A Collection of XQuery Sample Scripts.

OpenXML versus XQuery

XQuery is usually faster than OpenXML when shredding small XML documents, but slower when shredding large documents. The OpenXML method requires that the entire document be loaded into system memory, while the XQuery method does not. SQL Server performance can be severely affected when using OpenXML if the size of the XML document approaches the size of available system memory.

XQuery is much more flexible than OpenXML. XQuery supports the use of the  XML Data Modification Language, and its code is more manageable and scalable.

Conclusion

These first two articles barely scratch the surface of what can be done with XML and SQL Server, but they do get you to the point where you know enough to be able to learn by doing.

Resources

Rate

4.67 (40)

You rated this post out of 5. Change rating

Share

Share

Rate

4.67 (40)

You rated this post out of 5. Change rating