SQLServerCentral Article

Basics of XML and SQL Server, Part 1: Streaming XML

,

XML simply defines a container for data.

An XML document doesn't do anything with its data. There is no way to throw something at the document and observe the results. All you can do is browse its boring elements or embed some boring data inside it or memorize some boring XML syntax rules in the hope of doing something with it.

The XSL stylesheet is the missing link needed to interact with an XML document. An XSL stylesheet uses the Extensible Stylesheet Language to transform an XML document into XHTML for display in a web browser. When you embed a reference to an XSL stylesheet in an XML file and open it in an XHTML-compliant web browser you get to examine its contents in an entirely different way. You can display parent elements as records and their child elements as fields. You can selectively display and sort parent records. You can display some or all of the child fields. You can perform logic tests on element values.

In effect, an XSL stylesheet is a query that turns an inanimate XML document into a database, with the results of the query being displayed in a browser in the form of a table or however else you choose.

XML Elements versus Attributes

Most developers already know some HTML and the rudiments of XML. We know what well-formed HTML looks like and we know that well-formed XML looks the same. Symmetry is key. Elements have opening and closing tags, and you can nest data values or other elements inside of them. Nesting of child and grandchild and great-grandchild elements must be symmetrical. There has to be a root element within which all the other elements are symmetrically nested.

Did I mention that all XML element tags have to be in symmetry?

Then there are attributes.

Attributes are contained inside element tags. There is no symmetry or aesthetic logic to XML attributes. They can be arbitrarily tacked onto any XML element. Attributes are a functional part of HTML, but fatally disrupt the elegant simplicity of XML. XML attributes are redundant and unnecessary. There is nothing that can be done with XML attributes that cannot be done with XML elements, but there are lots of things that can be done with XML elements that cannot be done with XML attributes.

XML attributes are a legacy from other markup languages. Attributes work well in transformation languages such as HTML or XSL because their tags are actually more akin to commands. Nesting command switches in child elements would make no sense. Placing them inside the tag as attributes makes complete sense. XML, however, is just a container language. It's elements either hold data or they hold other elements that hold data somewhere down the tree. There are no command switches in XML. Arbitrarily placing some data values in elements and other data values in attributes is schizophrenic.

To be clear, the following two XML statements are equivalent.

<PARENT>
  <CHILD_01>VALUE_01</CHILD_01>
  <CHILD_02>VALUE_02</CHILD_02>
</PARENT>
<PARENT CHILD_01=VALUE_01>
  <CHILD_02>VALUE_02</CHILD_02>
</PARENT>

The second style of XML statement (where data that could have been added as a symmetric element has been arbitrarily added as an assymetric attribute) should be avoided.

When you must deal with XML files that contain attributes, there is a simple solution; convert the attributes to elements.

The following XSLT stylesheet, courtesy of O'Reilly's CD bookshelfs, does just that for any XML document.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml"/>
  <xsl:template match="*">
    <xsl:element name="{name()}">
      <xsl:for-each select="@*">
        <xsl:element name="{name()}">
          <xsl:value-of select="."/>
        </xsl:element>
      </xsl:for-each>
      <xsl:apply-templates select="*|text()"/>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

For a convenient method of transforming an XML document, shell tools online provides an xslt processor for the conversion and an xml formatter to tidy up the converted XML. There are lots of other online XML processors/formatters you can find with a web search.

Another option for the transformation is to use PowerShell and the System.Xml.Xsl namespace of .Net as described in Command-line XSLT processor with PowerShell.

Microsoft's XML Notepad 2007 is a free XML editor that does a number of things, including tidying up XML.

Demo Files

Attached is a file named "xml_demo_files.zip" containing three text files.

  1. The file named "StockMarketData.xml" is an XML file containing some test data.
  2. The file named "StockMarketData.xsl" is the XSL stylesheet file we will use to transform the data in the "StockMarketData.xml" file into XHTML for display in a web browser.

  3. The file named "sp500hst.txt" is the source data originally used to generate the data in "StockMarketData.xml." We will import this data into a SQL Server table and use the "FOR XML PATH" T-SQL clause to reproduce the XML stream in the "StockMarketData.xml" file.

Extract these three files into a folder named "C:\xml\" or some other convenient location.

Generating an XML Stream from SQL Server

The following SQL script creates a table named "sp500hst," inserts data from the "sp500hst.txt" file into the table, then executes a "FOR XML PATH" SQL statement against the table to generate an XML stream.

/*
     Change 'C:\xml\' to the folder containing the 'sp500hst.txt' demo file.
     If you get a 'file not found' error, it probably means your are referencing
     a database on a remote server. To run the script you need to copy the file
     to a shared folder on the remote server and reference it with the fully-
     qualified UNC path, ie. '\\ComputerName\SharedFolder\sp500hst.txt.'
*/BEGIN TRY
    DROP TABLE [dbo].[sp500hst]
END TRY
BEGIN CATCH
END CATCH
GO
CREATE TABLE [dbo].[sp500hst] (
    [Date] [VARCHAR](10) NULL,
    [Ticker] [VARCHAR](10) NULL,
    [Open] [VARCHAR](10) NULL,
    [High] [VARCHAR](10) NULL,
    [Low] [VARCHAR](10) NULL,
    [Close] [VARCHAR](10) NULL,
    [Volume] [VARCHAR](10) NULL
) ON [PRIMARY]
GO
BULK INSERT [dbo].[sp500hst] FROM 'C:\xml\sp500hst.txt' WITH (FIELDTERMINATOR = ',')
GO
SELECT * FROM [dbo].[sp500hst] WHERE [Date] = '20090821' ORDER BY [Ticker]
FOR XML PATH('StockData'),ROOT('StockMarketData')
GO

Execute the SQL script in SQL Server Management  Studio.

Click on the <StockMarketData> XML link in the "Results" tab to view the XML stream.

This XML data is the same data that is in the StockMarketData.xml file. The only difference is that the following two lines have been added to the beginning of the StockMarketData.xml file.

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="StockMarketData.xsl"?>

The first line references the encoding protocol to be used with the XML file, while the second line references the XSL file to be used to transform the XML into XHTML for display in a web browser.

If you wish to create your own XML file from this XML stream, left-click on the query window, type CTRL-A to select all of the XML text followed by CTRL-C to copy it into the clipboard. Open Notepad, right-click on the document screen and select "Paste" to insert the contents of the clipboard into it, then save it with an "xml" extension.

Don't forget to add the two lines referenced above to the beginning of the file.

Fun with the FOR XML Clause

The FOR XML clause has three modes, AUTO, PATH and RAW.

Try replacing the line...

FOR XML PATH('StockData'),ROOT('StockMarketData')

..in the SQL script with the line...

FOR XML AUTO

...and the line...

FOR XML RAW

You will notice that the XML data generated in these two modes is, by default, in the form of attributes rather than elements.

Referencing the Basic Syntax of the FOR XML Clause, experiment with all three modes to become more familiar with how to use them.

Transform StockMarketData.xml to XHTML with StockMarketData.xsl

Open the "C:\xml\" folder in Windows Explorer.

Double-click the "StockMarketData.xml" file.

Your default browser is automatically invoked to dispay the results of the XSL transformation.

Structure of the "StockMarketData.xml" File

Following is a truncated version of the "StockMarketData.xml" file. It consists of a root element named "StockMarketData," a parent element named "StockData" that segregates data for individual stocks, and child elements holding descriptive and quantitative data for each of the stocks.

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="StockMarketData.xsl"?>
<StockMarketData>
  <StockData>
    <Date>20090821</Date>
    <Ticker>A</Ticker>
    <Open>25.6</Open>
    <High>25.61</High>
    <Low>25.22</Low>
    <Close>25.55</Close>
    <Volume>34758</Volume>
  </StockData>
  <StockData>
    <Date>20090821</Date>
    <Ticker>AA</Ticker>
    <Open>12.64</Open>
    <High>12.73</High>
    <Low>12.49</Low>
    <Close>12.56</Close>
    <Volume>338295</Volume>
  </StockData>
  <StockData>
    <Date>20090821</Date>
    <Ticker>AAPL</Ticker>
    <Open>167.81</Open>
    <High>169.37</High>
    <Low>166.8</Low>
    <Close>169.22</Close>
    <Volume>148597</Volume>
  </StockData>
  <StockData>
    <Date>20090821</Date>
    <Ticker>ABC</Ticker>
    <Open>21.14</Open>
    <High>21.25</High>
    <Low>20.9</Low>
    <Close>21.05</Close>
    <Volume>33179</Volume>
  </StockData>
  <StockData>
    <Date>20090821</Date>
    <Ticker>ABT</Ticker>
    <Open>45.58</Open>
    <High>46</High>
    <Low>45.315</Low>
    <Close>45.39</Close>
    <Volume>84699</Volume>
  </StockData>
  <StockData>
    <Date>20090821</Date>
    <Ticker>ACS</Ticker>
    <Open>45.08</Open>
    <High>45.33</High>
    <Low>44.4</Low>
    <Close>45.32</Close>
    <Volume>17241</Volume>
  </StockData>
  <StockData>
    <Date>20090821</Date>
    <Ticker>ADBE</Ticker>
    <Open>32.49</Open>
    <High>32.93</High>
    <Low>32</Low>
    <Close>32.84</Close>
    <Volume>51086</Volume>
  </StockData>
  <StockData>
    <Date>20090821</Date>
    <Ticker>ADI</Ticker>
    <Open>28.21</Open>
    <High>28.44</High>
    <Low>27.64</Low>
    <Close>28.42</Close>
    <Volume>64696</Volume>
  </StockData>
</StockMarketData>

The first line of the XML file (<?xml version="1.0" encoding="ISO-8859-1"?>) contains the encoding declaration, which identifies the protocol used to represent the characters in the document.

The second line (<?xml-stylesheet type="text/xsl" href="StockMarketData.xsl"?>) points to the "StockMarketData.xsl" stylesheet file. This code instructs the browser to use the designated XSL stylesheet to transform the XML data into XHTML. If the stylesheet file were not in the same directory as the XML file we would have to use a path name, but since it is in the same directory we can use a the file name alone.

Structure of the "StockMarketData.xsl" Stylesheet

Following is the content of the "StockMarketData.xsl" file.

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <html>
  <body>
  <h2>Stock Market Data</h2>
  <table border="1">
    <tr bgcolor="#AAAAAA">
      <th>Date</th>
      <th>Ticker</th>
      <th>Open</th>
      <th>High</th>
      <th>Low</th>
      <th>Close</th>
      <th>Volume</th>
    </tr>
    <xsl:for-each select="StockMarketData/StockData">
    <tr>
      <td><xsl:value-of select="Date"/></td>
      <td><xsl:value-of select="Ticker"/></td>
      <td><xsl:value-of select="Open"/></td>
      <td><xsl:value-of select="High"/></td>
      <td><xsl:value-of select="Low"/></td>
      <td><xsl:value-of select="Close"/></td>
      <td><xsl:value-of select="Volume"/></td>
    </tr>
    </xsl:for-each>
  </table>
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>

Like the first line of the XML file, the first line of the XSL file contains the encoding declaration.

The second line of the XSL file contains the XSLT namespace declaration (www.w3.org/1999/XSL/Transform), which gives the processor access to the XSLT (Extensible Stylesheet Language Transformation) instruction set.

If you know any HTML, you recognize the HTML tags for a table within the XSL file. The header fields appear between the <th> and </th> HTML tags, while the record fields appear between the <td> and </td> HTML tags which are themselves nested within a <tr> </tr> tag set.

All the lines beginning with "<xsl:" are XSL transformation elements defined within the http://www.w3.org/1999/XSL/Transform namespace.

Add an XSLT "Sort" Element to the XSL Stylesheet

Add the line <xsl:sort select= "Open"/> containing the XSLT element "xsl:sort" to the StockMarketData.xsl file as shown in the screen shot below and save it.

Refresh your browser if it is still displaying the StockMarketData.xml document, otherwise double-click on it in Windows Explorer again.

As you can see, adding the line <xsl:sort select= "Open"/> caused the data to be sorted by the "Open" field.

For a full listing of all the available XSLT elements and how to use them see the XSLT Elements reference.

Next Step: Shredding XML

In Part 1 of this series we have covered the basics of generating XML from SQL Server with the FOR XML clause and processing the resulting XML document with an XSLT stylesheet.

In Part 2 we will go in the opposite direction, shredding a copy of an XML document that has been stored in memory into records in a SQL Server table with the OpenXML() and Xquery value() methods.

Conclusion

The hardest part of learning any technology is getting to the point where you know enough about it to be able to play with it.

This series of articles is intended to get you to that point with XML, which is not that easy to find your way into.

If you are ready to learn more, here are some good places to start:

  1. XML Tutorial
  2. XSLT Tutorial
  3. O'Reilly's XSLT
  4. O'Reilly's XML in a Nutshell
  5. Basic Syntax of the FOR XML Clause

Resources

Rate

4.31 (26)

You rated this post out of 5. Change rating

Share

Share

Rate

4.31 (26)

You rated this post out of 5. Change rating