SQLServerCentral Article

Simplify the Creation of XML from SQL Server Data

,

Simplify the Creation of XML from SQL Server Data

by Hugh Lynch

CTO, Restricted Stock Systems, Inc.

 

Introduction

Like many developers, my team and I were enchanted by the

possibilities of XML and XSL/XSLT when they emerged a few years ago.  It seemed

we could describe anything using XML and morph it to our hearts’ content using

XSL.  We wove the new technology deep into our designs, building substantial

web applications in which the data is rendered as XML then transformed to HTML

and other formats. 

While I am content with the decision to embrace XML, there

were significant obstacles.  Chiefly, how to create XML from our relational

data in SQL Server?  Existing techniques held early promise, but proved either too

simplistic (FOR XML AUTO) or too cumbersome (FOR XML EXPLICIT, XML Views, etc)

for widespread adoption. 

We wanted something simple and efficient to leverage our existing

tools, experience and the capabilities of SQL Server.  We invented a simple

XML-assembly grammar that we embed in the results returned from stored

procedures, utilizing SQL Server’s ability to return multiple “chained” result

sets from a single command invocation.

Next, we created a general-purpose interpreter (we

implemented COM and .NET versions) to parse the embedded XML-assembly

instructions and render the returned data accordingly as XML.

Finally, we developed two general-purpose invocation

utilities, one for the command line and one web-based (implemented in .NET and ASP.NET,

respectively).  These tools provide simple mechanisms to invoke parameterized stored

procedures, process the results using the interpreter, and output the resulting

XML, either raw or transformed.

The benefits of this approach, which we call “data shapes”

or simply “shapes”, include:

a.       Simplicity

With the grammar defined and the interpreter written, developers simply write

conforming stored procedures to create arbitrarily complex XML documents. 

Unlike some alternatives, complex documents do not require complex procedures –

just select the data to assemble and accompany it with simple text instructions

like “RELATE Employee.CompanyID to Company.CompanyID” (more on the grammar

later).

b.      Efficiency. 

Returning all of the data for a document (sometimes dozens of recordsets) from

a stored procedure keeps database trips to an absolute minimum.  The technique

also delegates the assembly of the XML from the database server to the

interpreter, which can be distributed across middle- or presentation-layer

machines.  Writing XML involves conversion and formatting of native data types

as text, character escaping, and filtering of child recordsets, all of which

consume valuable CPU cycles.

c.       Performance

and Control.  Writing shapes as stored procedures gives the developer all

the benefits of that language:  pre-compiled execution plans, caching, parameterization,

conditional logic, temporary storage, permission controls, nested calls, etc. 

d.      Familiarity. 

While the grammar of shapes is no doubt novel, the tools used to create, edit,

test and analyze them are not.  Shapes can be developed in Query Analyzer and

Visual Studio or any other T-SQL tool.  Their performance can be evaluated

using SQL Profiler, and dissected by examining execution plans.  Because a

shape encapsulates both data selection and assembly instruction in a single

scriptable procedure, it can be easily migrated between development, test and

production environments.

This article provides a complete definition of the shape

grammar as well as functional explanations of the shape interpreter and shape

invoker utilities.  The accompanying download includes the binary and

configuration files necessary to experiment with shapes on your own. 

Installation instructions are listed at the end of the article. 

You can also see shapes in action at http://tech.rssgroup.com/shapes.

Let’s begin with some examples.

Example 1 – Hello, World!

Tradition dictates that all new programming technologies

introduce themselves, so here is a very simple shape listing:

create procedure dbo.shpHello
as
   select 'dataset root.Hello' _shp
   select 'Hello, World!' Msg
go

which can be invoked through ASP.NET, using the URL:

And here are the results it produces:

<?xml version="1.0" encoding="utf-8"?>
<Hello Msg="Hello, World!"/>

While trivial, there are a couple of things to note.  The

shape instruction ('dataset root.Hello') tells the interpreter to name the following result set “root” and to represent each record within it as a <Hello> element.  By default, the interpreter maps the single column of the data set (Msg) to an attribute of the same name in the <Hello> element.

Example 2 – A Realistic Report

Using Microsoft’s standard Northwind database, we will create a shape to return all of the orders created by a specified employee during a specified date range.  The data will be organized by customer.

The following URL invokes the stored procedure dbo.shpEmployeeActivity with three parameters: employeeID, startDate and endDate.

Here’s the listing of the stored procedure.  The embedded instruction statements are again highlighted in blue.  The procedure uses a table variable to store orderIDs.

create procedure dbo.shpEmployeeActivity (
  @employeeID  int
, @startDate   smalldatetime
, @endDate     smalldatetime
as
   set nocount on
   -- create a table variable to hold the set of filtered orderIDs
   declare @nOrders int
   declare @tblOrders table (
      OrderID int primary key)
   insert @tblOrders
   select orderID
    from orders 
    where employeeID = @employeeID
    and orderDate between @startDate and @endDate
   set @nOrders = @@rowcount
   set nocount off
   select 'dataset root.Employee' _shp
   select EmployeeID, LastName, FirstName, Title, ...
    from Employees
    where employeeID=@employeeID
   if @nOrders>0
    begin
      select 'dataset Customers.Customer relate to root' _shp
      select * 
       from customers
       where customerID in 
         (select customerID from orders where orderID in 
            (select orderID from @tblOrders))
      select 'dataset Orders.Order relate customerID to Customers.customerID' _shp
      select *
       from orders
       where orderID in (select orderID from @tblOrders)

      select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp
      select *
       from [Order Details]
       where orderid in (select orderID from @tblOrders)
   end
go

And here are the results (wider date ranges produce a lot more data):

<?xml version="1.0" encoding="utf-8"?>
<Employee EmployeeID="6" LastName="Suyama" FirstName="Michael"...>
   <Notes><![CDATA[Michael is a graduate of...]]></Notes>
   <Customer CustomerID="GOURL" CompanyName="Gourmet Lanchonetes"...>
      <Order OrderID="10423" EmployeeID="6" OrderDate="1/23/1997"...>
         <Item ProductID="31" UnitPrice="10.0000" Quantity="14" Discount="0" />
         <Item ProductID="59" UnitPrice="44.0000" Quantity="20" Discount="0" />
      </Order>
   </Customer>
   <Customer CustomerID="LAMAI" CompanyName="La maison d'Asie"...>
      <Order OrderID="10425" EmployeeID="6" OrderDate="1/24/1997"...>
         <Item ProductID="55" UnitPrice="19.2000" Quantity="10" Discount="0.25" />
         <Item ProductID="76" UnitPrice="14.4000" Quantity="20" Discount="0.25" />
      </Order>
   </Customer>
</Employee>

As in the prior example, records were mapped to elements and columns were mapped to attributes of the same name, except for the SQL text-typed column “Notes”, which was mapped to a CDATA block within a child element in order to preserve whitespace and minimize character escaping.

Let’s take a closer look at the embedded instructions.

   select 'dataset root.Employee' _shp

   select ... from employees where ...

   select 'dataset Customers.Customer relate to root' _shp

   select * from customers where ...

   select 'dataset Orders.Order relate customerID to Customers.customerID' _shp

   select * from orders where ...

   select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp

   select * from [Order Details] where ...

Notice how the selection of each data set (SELECT [fields] FROM

)

is preceded by an instruction (SELECT

‘dataset...’ _shp).  These instructions tell the shape interpreter what

is coming next and where to put it. 

Here’s what the instructions mean:

   select

'dataset root.Employee' _shp

Name the following result set “root” and represent each

record within it as an <Employee> element.

   select

'dataset Customers.Customer relate to root' _shp

Name the following result set “Customers” and represent each

record within it a <Customer> element.  Make the <Customer>

elements appear as child elements of the <Employee> elements produced by

the “root” result set.  In this case, there is only one <Employee>

element because we are selecting by primary key.  If there were more than one,

the <Customer> elements would be repeated as child elements of each.

   select

'dataset Orders.Order relate customerID to Customers.customerID' _shp

Name the following result set

“Orders” and represent each record within it as an <Order> element.  Make

the <Order> elements appear as child elements of the <Customer>

elements produced by the “Customers” result set.  The <Order> elements

will be filtered such that only those orders with matching customerID fields

appear as children of each <Customer> element.  When rendering to XML,

the shape interpreter will not render the customerID attribute of the

<Order> element, since it is implicit in the XML structure.

   select

'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp

Name the following result set “OrderDetails” and represent each

record within it as an <Item> element.  Make the <Item> elements

appear as child elements of the <Order> elements produced by the “Orders”

result set, grouped by orderID.

Note that any Order records that do not match a selected

Customer record will not appear in the XML.  Nor will any OrderDetails records

appear that do not match a selected Order.  In other words, if the where

clauses (in orange) were omitted, the XML produced would be identical, but the

amount of data loaded by SQL Server and passed to the interpreter would

increase dramatically.

Because this error is not detectable in the output, except

through degraded performance, it can be difficult to diagnose.  It is always a

good idea to test your shapes in Query Analyzer to verify that the data sets

are appropriately filtered.  It is also possible that a future version of the

interpreter could detect this condition and produce a warning.

While this shape is intended to be representative of actual,

useful shapes, it does not fully exercise the shape grammar, which provides

fine controls over the mapping and formatting of records and columns.  Simple

options are available to:

a.       wrap

record elements in a grouping element.  For example, <Customer> elements

could be wrapped by a <Customers> element.

b.      render

all columns as child elements

c.       render

an individual column as an attribute, child element, free text, CDATA block or

nested XML

d.      format

numeric and date columns using Excel-like formatting strings

e.       omit

a column from the XML output

To get a better sense of the possibilities, let’s examine

the full grammar.

Shape Grammar

Syntax

DATASET datasetName.elementName[

WITH options][ RELATE clause[ RELATE clause...]]

RELATE clause:

   RELATE [TO parentDataset

         | childField1[,childField2...]

TO parentDataset.parentField1[,parentField2]]

   [ WITH options]

options:

   option[

option...]

option:

   setOption[,setOption...]|[fieldName:fieldOption[,fieldOption]

setOption:

   elements|attributes|showkeys|group[(groupName)]|xmlns(prefix=URI)

fieldOption:

   fieldRepresentation|format("formatString")|omit

fieldRepresentation:

   attrib|element|text|cdata|raw[(indent)]

Arguments

datasetName

      Specifies

a name for the subsequent result set that must be unique within all result sets

returned by the stored procedure.  If a grouping option is specified, this name

is the default element name for the grouping element.

elementName

      Specifies

a name for the elements generated for each record in the subsequent result set.

RELATE TO parentDataset

      Indicates

that the elements generated for each record in the subsequent result set should

be nested beneath each element generated from the result set name parentDataset

If more than one element is produced from the parent result set, all child

elements will be repeated beneath each parent element.

RELATE childField1[,childField2...]

TO parentDataset.parentField1[,parentField2]]

      Indicates

that the elements generated for each record in the subsequent result set should

be nested beneath the elements generated from the result set name parentDataset,

and filtered such that only those child records  where the values of the

specified child fields match the values of the specified parent fields will

produce child elements beneath a given parent element.  If a child record does

not match any parent record, it will not produce an XML element.

WITH options

      Specifies

options that affect the appearance of the elements generated for each record in

the subsequent result set.  Options set on the dataset are overridden by

options set in a RELATE clause.  While it is unusual to relate a result set to

multiple parents, it is possible, and in such cases each RELATE clause may have

separate and distinct options.

setOption:

   attribs

            Indicates

that fields should produce XML attributes (this is the default)

   elements

            Indicates

that fields should produce XML elements, rather than XML attributes by default

   showkeys

            Indicates

that child fields related to parent fields should not be omitted from the child

element as they are by default.

   group[(groupName)]

            Indicates

that child elements should be wrapped within a grouping element and optionally

specifies the name of that grouping element.  The default group element name is

the datasetName.

   xmlns(namespace)

            Specifies

and XML namespace to appear in each child element.

fieldOption:

   fieldRepresentation

            Specifies

the representation of a field within the resulting XML.  The default

representation for all fields other than text fields is as an XML attribute. 

The default representation for text fields is as a CDATA block within a child

element.

      attrib

                  represent

the field as an XML attribute of the record element:

      <elementName ... fieldName=”value” …/>

value will be appropriately escaped for attribute representation

      element

                  represent

the field as a child element of the record element:

      <elementName ...>

            …

            <fieldName>value</fieldName>

            …

      </elementName>

value will be appropriately escaped for text representation

      text

                  represent

the field as text within the record element:

      <elementName ...>

            …

            value

            …

      </elementName>

value will be appropriately escaped for text representation

      cdata

                  represent

the field as a CDATA within a child element:

      <elementName ...>

            …

            <fieldName><![CDATA[value]]></fieldName>

            …

      </elementName>

value will be not be escaped

      cdata,nowrap

                  represent

the field as a CDATA block within the record element:

      <elementName ...>

            …

            <![CDATA[value]]>

            …

      </elementName>

value will be not be escaped

      raw[(indent)]

                  represent

the field as raw xml within a child element:

      <elementName ...>

            …

            <fieldName>

                  value

            </fieldName>

            …

      </elementName>

value will be not be escaped and may optionally be indented a specified

amount

      raw[(indent)],nowrap

                  represent

the field as raw xml within the record element:

      <elementName ...>

            …

            value

            …

      </elementName>

value will be not be escaped and may optionally be indented a specified

amount

   format("formatString")

            Specifies

a format with which to represent the field in the output XML.  A tremendous

variety of formats are supported, because the DataShape class employs the

ToString(format) methods of the underlying data types.  Numbers can be

formatted with such complex formats as “#,0.00;(#,0.00);--“ or simple standard

formats like “N” (adds commas).  Dates also support custom formats, like “ddd -

MMM d, yyyy”, and an extensive set of standard formats.

   omit

            Indicates

that the field should be omitted from the output XML.

 

The Shape Interpreter

The shape interpreter analyzes the multiple result sets

returned from the stored procedure, parses the instructions according to the

grammar defined above, and renders the data accordingly as XML.  The implementation

of the interpreter may interest developers wanting to customize or simply

understand it, but it is not a requirement for its use.  In fact, its function

is thoroughly wrapped by the URL- and command-line invocation methods described

in the next section, so feel free to skip ahead.

Our .NET implementation of the interpreter (included in the

download) is encapsulated in a DataShape class derived from

System.Data.DataSet.  The DataShape class is typically employed in a two-step

operation.  First, the data and instructions are read from the data source (a

SqlCommand or SqlDataReader), populating the DataSet’s internal DataTable and

DataRelation collections.  Then the DataShape is rendered as XML by passing a

suitable output (filename, IO.Stream, TextWriter or XmlWriter) to its WriteShapeXml()

method.

After writing, the DataShape is typically disposed of,

having served its purpose.  It is, however, fully serializable and can be

written to and restored from a file (or other destination) if its contents are

still of interest.  For instance, it is fairly straightforward to implement a

paging system in this manner, without requiring additional trips to the

database.

The DataShape class can also be populated manually or with

multiple calls to various stored procedures because it exposes and employs the

underlying capabilities of the standard DataSet class.

The Shape Invokers

Having all of information necessary to create an XML

document contained in the results of our shape procedures, along with a

general-purpose interpreter for converting those results to XML, all that is

needed is a mechanism to invoke the store procedure, process the results using

the interpreter, and output the results.  We created two: one for command line

invocations and one for browsers.

Let’s begin with the web-based tool, which we created using

ASP.NET.  Essentially, we wanted a concise URL that identifies a stored

procedure and allows for the specification of parameter values and a few

options.  The two examples above include simple web-based invocation URLs, with

and without parameters:

The general format of these URLs is:

The first thing to note is that there is actually no .aspx file necessary to process these requests.  By adding an entry to the httpHandlers section of the web.config file, all .aspx requests are routed to the ShapeHandler class of the RSSWebShapes assembly, which knows how to interpret the URL.

If shapes need to co-exist with standard .aspx pages, it is a simple matter to register a different extension for shapes and substitute that extension for .aspx in shape URLs.  Typically, we use .shp.  Changing the extension does require custom configuration of the virtual directory in IIS.

The ShapeHandler class creates a SqlCommand object for executing a stored procedure named shapeName.  Using a database connection defined in the web.config, the ShapeHandler retrieves the parameter list from the database and populates any command parameters that were provided in the URL.

When passing parameters, note that parameter values should be URL escaped and that parameter names should omit the @ prefix.

The available options are described below:

URL Invoker Options

x[:xsltFileName]

      Specifies

an XSLT file name that is used to transform the XML.  The ShapeHandler will

search for templates in paths specified in the “ShapeTemplatePaths” app setting

of the web.config file.  If an extension is omitted, the default is .xslt.   If

the entire name is omitted, a file name of shpName.xslt is used.  If the x

option is specified the content type of the response defaults to text/html; to

specify a different type use the m option.

m:contentType

      Specifies

the contentType of the response.  The default is text/xml unless the x option

is used, in which case the default is text/html.

r[:rootName]

      Specifies

an element name in which to wrap the output of the shape procedure.  If the

option is specified, but no rootName is provided, the default name is “root”.

c:indentChar

      Specifies a character to use

when indenting XML child elements.   The default is tab.

i:indentLevel

      Specifies

the number of indentChars to use when indenting XML child elements.  The

default is 1.  A value of  0 (zero) turns off all XML formatting, including

newlines.

o[:outputFile]

      Specifies

that the output should be directed to a file rather than the browser.  The

output location is determined by the “ShapeOutputPath” app setting in the

web.config file.  If no extension is specified, the default is .xml.  If no outputFile

is specified, the default is shpName.xml.

The command line invoker, makeShape.exe, supports

essentially the same options with the exception of the contentType.  Output is

written to the console unless an output file is specified.  The general format

of the command line is:

makeShape –s=shapeName [-option[=val]...]] paramA=val paramB=val

The available options are described below:

Command Line Invoker Options

x[=xsltFilePath]

      Specifies the path of an XSLT file that is used to transform the XML.  If an extension is omitted, the default is .xslt.   If the entire name is omitted, a file name of shpName.xslt is used.

r[=rootName]

      Specifies an element name in which to wrap the output of the shape procedure.  If the option is specified, but no rootName is provided, the default name is “root”.

c=indentChar

      Specifies a character to use when indenting XML child elements.   The default is tab.

i=indentLevel

      Specifies the number of indentChars to use when indenting XML child elements.  The default is 1.  A value of  0 (zero) turns off all XML formatting, including newlines.

o[=outputPath]

      Specifies the path of the file to which output should be written rather than the console.  If no extension is specified, the default is .xml.  If no outputPath is specified, the default is shpName.xml.

Shape Testing

The accompanying distribution includes a general-purpose web page (default.htm) for testing stored procedures in general, and shapes in particular.  The page provides three panes: the list pane, the form pane and the results pane.  The list pane shows all of the stored procedures defined in a configured database that match a search criterion, such as “shp%”. 

The form pane presents text boxes for each parameter of a stored procedure selected from the list pane as well as a button to execute it. The form displays default values and data types for each parameter and a checkbox to specify use of the default.  The form pane also allows for the specification of invoker options, such as an XSLT transform or output file name.

The results pane shows the shape results, typically as XML.

Use of the shape testing web page requires the creation of one stored procedure and one user-defined function within the configured database.  See the installation instructions for details.

Conclusion

While there are several documented methods for creating XML from relational data, none offers the combination of simplicity, performance and control that encourages widespread adoption.

With shapes, a developer can use familiar tools and language (T-SQL) to create a single, intuitive stored procedure that produces a potentially complex, precisely controlled XML document that is URL- and ommand-line-addressable. Moreover, the data for the document are gathered with optimal efficiency in a single trip to the database, and assembled into XML by client processes that can be distributed across intermediate servers.

To see shapes in action without performing any installation, visit http://tech.rssgroup.com/shapes.

Currently, the tools are .NET-based and SQL Server centric. The concept and the grammar, however, are thoroughly portable and could support different databases (notably Oracle) and different middleware (e.g. java).  Given sufficient interest, I will establish a project on SourceForge.NET to manage future development.

Installation

To experiment with shapes, I recommend using Microsoft’s standard Northwind database (although if you have a test database with existing stored procedures that can be more interesting).

1.   The first step is to create a directory to host a simple website, let’s say:

      c:\inetpub\wwwroot\shapes

F

2.   Unzip the included files to that directory:

      default.htm                    - presents a frameset used to test shapes

      web.config                          - configures database access

      bin\RSSWebShapes.dll - the shape interpreter

      _xslt\shpList.xslt                 - used by default.htm

3.   Edit the properties of the website using Internet

Services Manager:

a.       create a new application for the site if it is using the Default Application

b.      turn off anonymous access – this means the web user’s credentials will be used to establish database connections.

4.   Edit the database connection string in web.config:

  <DatabaseConnectionSets>

    <local        core="server=localhost;database=Northwind;trusted_connection=yes"

    />

  </DatabaseConnectionSets>

5.   Use Query Analyzer to install shpList and some sample shapes.

      _sql\shpList.sql             - T-SQL proc used by default.htm

      _sql\shpSamples.sql            - T-SQL procs for experimentation

6.   Open http://localhost/shapes/

      This site is a generic site for listing and executing stored procedures.  It is especially useful for testing shapes.

Code Download

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating