Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Simplify the Creation of XML from SQL Server Data

By Hugh Lynch,

Simplify the Creation of XML from SQL Server Data by Hugh Lynch CTO, Restricted Stock Systems, Inc.

 

Introduction

Like many developers, my team and I were enchanted by the possibilities of XML and XSL/XSLT when they emerged a few years ago.  It seemed we could describe anything using XML and morph it to our hearts’ content using XSL.  We wove the new technology deep into our designs, building substantial web applications in which the data is rendered as XML then transformed to HTML and other formats. 

While I am content with the decision to embrace XML, there were significant obstacles.  Chiefly, how to create XML from our relational data in SQL Server?  Existing techniques held early promise, but proved either too simplistic (FOR XML AUTO) or too cumbersome (FOR XML EXPLICIT, XML Views, etc) for widespread adoption. 

We wanted something simple and efficient to leverage our existing tools, experience and the capabilities of SQL Server.  We invented a simple XML-assembly grammar that we embed in the results returned from stored procedures, utilizing SQL Server’s ability to return multiple “chained” result sets from a single command invocation.

Next, we created a general-purpose interpreter (we implemented COM and .NET versions) to parse the embedded XML-assembly instructions and render the returned data accordingly as XML.

Finally, we developed two general-purpose invocation utilities, one for the command line and one web-based (implemented in .NET and ASP.NET, respectively).  These tools provide simple mechanisms to invoke parameterized stored procedures, process the results using the interpreter, and output the resulting XML, either raw or transformed.

The benefits of this approach, which we call “data shapes” or simply “shapes”, include:

a.       Simplicity.  With the grammar defined and the interpreter written, developers simply write conforming stored procedures to create arbitrarily complex XML documents.  Unlike some alternatives, complex documents do not require complex procedures – just select the data to assemble and accompany it with simple text instructions like “RELATE Employee.CompanyID to Company.CompanyID” (more on the grammar later).

b.      Efficiency.  Returning all of the data for a document (sometimes dozens of recordsets) from a stored procedure keeps database trips to an absolute minimum.  The technique also delegates the assembly of the XML from the database server to the interpreter, which can be distributed across middle- or presentation-layer machines.  Writing XML involves conversion and formatting of native data types as text, character escaping, and filtering of child recordsets, all of which consume valuable CPU cycles.

c.       Performance and Control.  Writing shapes as stored procedures gives the developer all the benefits of that language:  pre-compiled execution plans, caching, parameterization, conditional logic, temporary storage, permission controls, nested calls, etc. 

d.      Familiarity.  While the grammar of shapes is no doubt novel, the tools used to create, edit, test and analyze them are not.  Shapes can be developed in Query Analyzer and Visual Studio or any other T-SQL tool.  Their performance can be evaluated using SQL Profiler, and dissected by examining execution plans.  Because a shape encapsulates both data selection and assembly instruction in a single scriptable procedure, it can be easily migrated between development, test and production environments.

This article provides a complete definition of the shape grammar as well as functional explanations of the shape interpreter and shape invoker utilities.  The accompanying download includes the binary and configuration files necessary to experiment with shapes on your own.  Installation instructions are listed at the end of the article. 

You can also see shapes in action at http://tech.rssgroup.com/shapes.

Let’s begin with some examples.

Example 1 – Hello, World!

Tradition dictates that all new programming technologies introduce themselves, so here is a very simple shape listing:

create procedure dbo.shpHello
as
   select 'dataset root.Hello' _shp
   select 'Hello, World!' Msg
go

which can be invoked through ASP.NET, using the URL:

And here are the results it produces:

<?xml version="1.0" encoding="utf-8"?>
<Hello Msg="Hello, World!"/>

While trivial, there are a couple of things to note.  The shape instruction ('dataset root.Hello') tells the interpreter to name the following result set “root” and to represent each record within it as a <Hello> element.  By default, the interpreter maps the single column of the data set (Msg) to an attribute of the same name in the <Hello> element.

Example 2 – A Realistic Report

Using Microsoft’s standard Northwind database, we will create a shape to return all of the orders created by a specified employee during a specified date range.  The data will be organized by customer.

The following URL invokes the stored procedure dbo.shpEmployeeActivity with three parameters: employeeID, startDate and endDate.

Here’s the listing of the stored procedure.  The embedded instruction statements are again highlighted in blue.  The procedure uses a table variable to store orderIDs.

create procedure dbo.shpEmployeeActivity (
  @employeeID  int
, @startDate   smalldatetime
, @endDate     smalldatetime
as

   set nocount on

   -- create a table variable to hold the set of filtered orderIDs
   declare @nOrders int
   declare @tblOrders table (
      OrderID int primary key)

   insert @tblOrders
   select orderID
    from orders 
    where employeeID = @employeeID
    and orderDate between @startDate and @endDate

   set @nOrders = @@rowcount

   set nocount off

   select 'dataset root.Employee' _shp

   select EmployeeID, LastName, FirstName, Title, ...
    from Employees
    where employeeID=@employeeID

   if @nOrders>0
    begin
      select 'dataset Customers.Customer relate to root' _shp
      select * 
       from customers
       where customerID in 
         (select customerID from orders where orderID in 
            (select orderID from @tblOrders))

      select 'dataset Orders.Order relate customerID to Customers.customerID' _shp
      select *
       from orders
       where orderID in (select orderID from @tblOrders)


      select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp

      select *
       from [Order Details]
       where orderid in (select orderID from @tblOrders)

   end
go

And here are the results (wider date ranges produce a lot more data):

<?xml version="1.0" encoding="utf-8"?>
<Employee EmployeeID="6" LastName="Suyama" FirstName="Michael"...>
   <Notes><![CDATA[Michael is a graduate of...]]></Notes>
   <Customer CustomerID="GOURL" CompanyName="Gourmet Lanchonetes"...>
      <Order OrderID="10423" EmployeeID="6" OrderDate="1/23/1997"...>
         <Item ProductID="31" UnitPrice="10.0000" Quantity="14" Discount="0" />
         <Item ProductID="59" UnitPrice="44.0000" Quantity="20" Discount="0" />
      </Order>
   </Customer>
   <Customer CustomerID="LAMAI" CompanyName="La maison d'Asie"...>
      <Order OrderID="10425" EmployeeID="6" OrderDate="1/24/1997"...>
         <Item ProductID="55" UnitPrice="19.2000" Quantity="10" Discount="0.25" />
         <Item ProductID="76" UnitPrice="14.4000" Quantity="20" Discount="0.25" />
      </Order>
   </Customer>
</Employee>

As in the prior example, records were mapped to elements and columns were mapped to attributes of the same name, except for the SQL text-typed column “Notes”, which was mapped to a CDATA block within a child element in order to preserve whitespace and minimize character escaping.

Let’s take a closer look at the embedded instructions.

   select 'dataset root.Employee' _shp

   select ... from employees where ...

   select 'dataset Customers.Customer relate to root' _shp

   select * from customers where ...

   select 'dataset Orders.Order relate customerID to Customers.customerID' _shp

   select * from orders where ...

   select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp

   select * from [Order Details] where ...

Notice how the selection of each data set (SELECT [fields] FROM [table]) is preceded by an instruction (SELECT ‘dataset...’ _shp).  These instructions tell the shape interpreter what is coming next and where to put it. 

Here’s what the instructions mean:

   select 'dataset root.Employee' _shp

Name the following result set “root” and represent each record within it as an <Employee> element.

   select 'dataset Customers.Customer relate to root' _shp

Name the following result set “Customers” and represent each record within it a <Customer> element.  Make the <Customer> elements appear as child elements of the <Employee> elements produced by the “root” result set.  In this case, there is only one <Employee> element because we are selecting by primary key.  If there were more than one, the <Customer> elements would be repeated as child elements of each.

   select 'dataset Orders.Order relate customerID to Customers.customerID' _shp

Name the following result set “Orders” and represent each record within it as an <Order> element.  Make the <Order> elements appear as child elements of the <Customer> elements produced by the “Customers” result set.  The <Order> elements will be filtered such that only those orders with matching customerID fields appear as children of each <Customer> element.  When rendering to XML, the shape interpreter will not render the customerID attribute of the <Order> element, since it is implicit in the XML structure.

   select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp

Name the following result set “OrderDetails” and represent each record within it as an <Item> element.  Make the <Item> elements appear as child elements of the <Order> elements produced by the “Orders” result set, grouped by orderID.

Note that any Order records that do not match a selected Customer record will not appear in the XML.  Nor will any OrderDetails records appear that do not match a selected Order.  In other words, if the where clauses (in orange) were omitted, the XML produced would be identical, but the amount of data loaded by SQL Server and passed to the interpreter would increase dramatically.

Because this error is not detectable in the output, except through degraded performance, it can be difficult to diagnose.  It is always a good idea to test your shapes in Query Analyzer to verify that the data sets are appropriately filtered.  It is also possible that a future version of the interpreter could detect this condition and produce a warning.

While this shape is intended to be representative of actual, useful shapes, it does not fully exercise the shape grammar, which provides fine controls over the mapping and formatting of records and columns.  Simple options are available to:

a.       wrap record elements in a grouping element.  For example, <Customer> elements could be wrapped by a <Customers> element.

b.      render all columns as child elements

c.       render an individual column as an attribute, child element, free text, CDATA block or nested XML

d.      format numeric and date columns using Excel-like formatting strings

e.       omit a column from the XML output

To get a better sense of the possibilities, let’s examine the full grammar.

Shape Grammar

Syntax

DATASET datasetName.elementName[ WITH options][ RELATE clause[ RELATE clause...]]

RELATE clause:

   RELATE [TO parentDataset

         | childField1[,childField2...] TO parentDataset.parentField1[,parentField2]]

   [ WITH options]

options:

   option[ option...]

option:

   setOption[,setOption...]|[fieldName:fieldOption[,fieldOption]

setOption:

   elements|attributes|showkeys|group[(groupName)]|xmlns(prefix=URI)

fieldOption:

   fieldRepresentation|format("formatString")|omit

fieldRepresentation:

   attrib|element|text|cdata|raw[(indent)]

Arguments

datasetName

      Specifies a name for the subsequent result set that must be unique within all result sets returned by the stored procedure.  If a grouping option is specified, this name is the default element name for the grouping element.

elementName

      Specifies a name for the elements generated for each record in the subsequent result set.

RELATE TO parentDataset

      Indicates that the elements generated for each record in the subsequent result set should be nested beneath each element generated from the result set name parentDataset.  If more than one element is produced from the parent result set, all child elements will be repeated beneath each parent element.

RELATE childField1[,childField2...] TO parentDataset.parentField1[,parentField2]]

      Indicates that the elements generated for each record in the subsequent result set should be nested beneath the elements generated from the result set name parentDataset, and filtered such that only those child records  where the values of the specified child fields match the values of the specified parent fields will produce child elements beneath a given parent element.  If a child record does not match any parent record, it will not produce an XML element.

WITH options

      Specifies options that affect the appearance of the elements generated for each record in the subsequent result set.  Options set on the dataset are overridden by options set in a RELATE clause.  While it is unusual to relate a result set to multiple parents, it is possible, and in such cases each RELATE clause may have separate and distinct options.

setOption:

   attribs

            Indicates that fields should produce XML attributes (this is the default)

   elements

            Indicates that fields should produce XML elements, rather than XML attributes by default

   showkeys

            Indicates that child fields related to parent fields should not be omitted from the child element as they are by default.

   group[(groupName)]

            Indicates that child elements should be wrapped within a grouping element and optionally specifies the name of that grouping element.  The default group element name is the datasetName.

   xmlns(namespace)

            Specifies and XML namespace to appear in each child element.

fieldOption:

   fieldRepresentation

            Specifies the representation of a field within the resulting XML.  The default representation for all fields other than text fields is as an XML attribute.  The default representation for text fields is as a CDATA block within a child element.

      attrib

                  represent the field as an XML attribute of the record element:
      <elementName ... fieldName=”value” …/>
value
will be appropriately escaped for attribute representation

      element

                  represent the field as a child element of the record element:
      <elementName ...>
            …
            <fieldName>value</fieldName>
            …
      </elementName>
value
will be appropriately escaped for text representation

      text

                  represent the field as text within the record element:
      <elementName ...>
            …
            value
            …
      </elementName>
value
will be appropriately escaped for text representation

      cdata

                  represent the field as a CDATA within a child element:
      <elementName ...>
            …
            <fieldName><![CDATA[value]]></fieldName>
            …
      </elementName>
value will be not be escaped

      cdata,nowrap

                  represent the field as a CDATA block within the record element:
      <elementName ...>
            …
            <![CDATA[value]]>
            …
      </elementName>
value will be not be escaped

      raw[(indent)]

                  represent the field as raw xml within a child element:
      <elementName ...>
            …
            <fieldName>
                  value
            </fieldName>
            …
      </elementName>
value will be not be escaped and may optionally be indented a specified amount

      raw[(indent)],nowrap

                  represent the field as raw xml within the record element:
      <elementName ...>
            …
            value
            …
      </elementName>
value will be not be escaped and may optionally be indented a specified amount

   format("formatString")

            Specifies a format with which to represent the field in the output XML.  A tremendous variety of formats are supported, because the DataShape class employs the ToString(format) methods of the underlying data types.  Numbers can be formatted with such complex formats as “#,0.00;(#,0.00);--“ or simple standard formats like “N” (adds commas).  Dates also support custom formats, like “ddd - MMM d, yyyy”, and an extensive set of standard formats.

   omit

            Indicates that the field should be omitted from the output XML.

 

The Shape Interpreter

The shape interpreter analyzes the multiple result sets returned from the stored procedure, parses the instructions according to the grammar defined above, and renders the data accordingly as XML.  The implementation of the interpreter may interest developers wanting to customize or simply understand it, but it is not a requirement for its use.  In fact, its function is thoroughly wrapped by the URL- and command-line invocation methods described in the next section, so feel free to skip ahead.

Our .NET implementation of the interpreter (included in the download) is encapsulated in a DataShape class derived from System.Data.DataSet.  The DataShape class is typically employed in a two-step operation.  First, the data and instructions are read from the data source (a SqlCommand or SqlDataReader), populating the DataSet’s internal DataTable and DataRelation collections.  Then the DataShape is rendered as XML by passing a suitable output (filename, IO.Stream, TextWriter or XmlWriter) to its WriteShapeXml() method.

After writing, the DataShape is typically disposed of, having served its purpose.  It is, however, fully serializable and can be written to and restored from a file (or other destination) if its contents are still of interest.  For instance, it is fairly straightforward to implement a paging system in this manner, without requiring additional trips to the database.

The DataShape class can also be populated manually or with multiple calls to various stored procedures because it exposes and employs the underlying capabilities of the standard DataSet class.

The Shape Invokers

Having all of information necessary to create an XML document contained in the results of our shape procedures, along with a general-purpose interpreter for converting those results to XML, all that is needed is a mechanism to invoke the store procedure, process the results using the interpreter, and output the results.  We created two: one for command line invocations and one for browsers.

Let’s begin with the web-based tool, which we created using ASP.NET.  Essentially, we wanted a concise URL that identifies a stored procedure and allows for the specification of parameter values and a few options.  The two examples above include simple web-based invocation URLs, with and without parameters:

The general format of these URLs is:

The first thing to note is that there is actually no .aspx file necessary to process these requests.  By adding an entry to the httpHandlers section of the web.config file, all .aspx requests are routed to the ShapeHandler class of the RSSWebShapes assembly, which knows how to interpret the URL.

If shapes need to co-exist with standard .aspx pages, it is a simple matter to register a different extension for shapes and substitute that extension for .aspx in shape URLs.  Typically, we use .shp.  Changing the extension does require custom configuration of the virtual directory in IIS.

The ShapeHandler class creates a SqlCommand object for executing a stored procedure named shapeName.  Using a database connection defined in the web.config, the ShapeHandler retrieves the parameter list from the database and populates any command parameters that were provided in the URL.

When passing parameters, note that parameter values should be URL escaped and that parameter names should omit the @ prefix.

The available options are described below:

URL Invoker Options

x[:xsltFileName]

      Specifies an XSLT file name that is used to transform the XML.  The ShapeHandler will search for templates in paths specified in the “ShapeTemplatePaths” app setting of the web.config file.  If an extension is omitted, the default is .xslt.   If the entire name is omitted, a file name of shpName.xslt is used.  If the x option is specified the content type of the response defaults to text/html; to specify a different type use the m option.

m:contentType

      Specifies the contentType of the response.  The default is text/xml unless the x option is used, in which case the default is text/html.

r[:rootName]

      Specifies an element name in which to wrap the output of the shape procedure.  If the option is specified, but no rootName is provided, the default name is “root”.

c:indentChar

      Specifies a character to use when indenting XML child elements.   The default is tab.

i:indentLevel

      Specifies the number of indentChars to use when indenting XML child elements.  The default is 1.  A value of  0 (zero) turns off all XML formatting, including newlines.

o[:outputFile]

      Specifies that the output should be directed to a file rather than the browser.  The output location is determined by the “ShapeOutputPath” app setting in the web.config file.  If no extension is specified, the default is .xml.  If no outputFile is specified, the default is shpName.xml.

The command line invoker, makeShape.exe, supports essentially the same options with the exception of the contentType.  Output is written to the console unless an output file is specified.  The general format of the command line is:

makeShape –s=shapeName [-option[=val]...]] paramA=val paramB=val

The available options are described below:

Command Line Invoker Options

x[=xsltFilePath]

      Specifies the path of an XSLT file that is used to transform the XML.  If an extension is omitted, the default is .xslt.   If the entire name is omitted, a file name of shpName.xslt is used.

r[=rootName]

      Specifies an element name in which to wrap the output of the shape procedure.  If the option is specified, but no rootName is provided, the default name is “root”.

c=indentChar

      Specifies a character to use when indenting XML child elements.   The default is tab.

i=indentLevel

      Specifies the number of indentChars to use when indenting XML child elements.  The default is 1.  A value of  0 (zero) turns off all XML formatting, including newlines.

o[=outputPath]

      Specifies the path of the file to which output should be written rather than the console.  If no extension is specified, the default is .xml.  If no outputPath is specified, the default is shpName.xml.

Shape Testing

The accompanying distribution includes a general-purpose web page (default.htm) for testing stored procedures in general, and shapes in particular.  The page provides three panes: the list pane, the form pane and the results pane.  The list pane shows all of the stored procedures defined in a configured database that match a search criterion, such as “shp%”. 

The form pane presents text boxes for each parameter of a stored procedure selected from the list pane as well as a button to execute it. The form displays default values and data types for each parameter and a checkbox to specify use of the default.  The form pane also allows for the specification of invoker options, such as an XSLT transform or output file name.

The results pane shows the shape results, typically as XML.

Use of the shape testing web page requires the creation of one stored procedure and one user-defined function within the configured database.  See the installation instructions for details.

Conclusion

While there are several documented methods for creating XML from relational data, none offers the combination of simplicity, performance and control that encourages widespread adoption.

With shapes, a developer can use familiar tools and language (T-SQL) to create a single, intuitive stored procedure that produces a potentially complex, precisely controlled XML document that is URL- and ommand-line-addressable. Moreover, the data for the document are gathered with optimal efficiency in a single trip to the database, and assembled into XML by client processes that can be distributed across intermediate servers.

To see shapes in action without performing any installation, visit http://tech.rssgroup.com/shapes.

Currently, the tools are .NET-based and SQL Server centric. The concept and the grammar, however, are thoroughly portable and could support different databases (notably Oracle) and different middleware (e.g. java).  Given sufficient interest, I will establish a project on SourceForge.NET to manage future development.

Installation

To experiment with shapes, I recommend using Microsoft’s standard Northwind database (although if you have a test database with existing stored procedures that can be more interesting).

1.   The first step is to create a directory to host a simple website, let’s say:

      c:\inetpub\wwwroot\shapes

F

2.   Unzip the included files to that directory:

      default.htm                    - presents a frameset used to test shapes

      web.config                          - configures database access

      bin\RSSWebShapes.dll - the shape interpreter

      _xslt\shpList.xslt                 - used by default.htm

3.   Edit the properties of the website using Internet Services Manager:

a.       create a new application for the site if it is using the Default Application

b.      turn off anonymous access – this means the web user’s credentials will be used to establish database connections.

4.   Edit the database connection string in web.config:

  <DatabaseConnectionSets>

    <local        core="server=localhost;database=Northwind;trusted_connection=yes"

    />

  </DatabaseConnectionSets>

5.   Use Query Analyzer to install shpList and some sample shapes.

      _sql\shpList.sql             - T-SQL proc used by default.htm

      _sql\shpSamples.sql            - T-SQL procs for experimentation

6.   Open http://localhost/shapes/

      This site is a generic site for listing and executing stored procedures.  It is especially useful for testing shapes.

Code Download

Total article views: 10420 | Views in the last 30 days: -2
 
Related Articles
ARTICLE

XML Workshop XVI - Shaping the XML results

Continuing on with his series, Jacob Sebastian shows how you can shape the query results to a certai...

ARTICLE

Converting Shaped Recordsets into XML

This article shows you how to take a shaped recordset and convert the results into XML all while pre...

FORUM

Need to select all elements of an XML document from an XML data type column

Just need to select all elements of an XML document from an XML data type column

SCRIPT

Generating combinations 'm elements from n-element set'

This procedure generates a dataset with combinations of elements_to_select taken from number_of_valu...

ARTICLE

Supplement to 'Converting Shaped Recordsets into XML'

Jon Winer further discusses some comparisons using ADO versus the XML DOM to parse shaped recordsets...

Tags
miscellaneous    
xml    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones