Simplify the Creation of XML from SQL Server Data
by Hugh Lynch
CTO, Restricted Stock Systems, Inc.
Introduction
Like many developers, my team and I were enchanted by the
possibilities of XML and XSL/XSLT when they emerged a few years ago. It seemed
we could describe anything using XML and morph it to our hearts’ content using
XSL. We wove the new technology deep into our designs, building substantial
web applications in which the data is rendered as XML then transformed to HTML
and other formats.
While I am content with the decision to embrace XML, there
were significant obstacles. Chiefly, how to create XML from our relational
data in SQL Server? Existing techniques held early promise, but proved either too
simplistic (FOR XML AUTO) or too cumbersome (FOR XML EXPLICIT, XML Views, etc)
for widespread adoption.
We wanted something simple and efficient to leverage our existing
tools, experience and the capabilities of SQL Server. We invented a simple
XML-assembly grammar that we embed in the results returned from stored
procedures, utilizing SQL Server’s ability to return multiple “chained” result
sets from a single command invocation.
Next, we created a general-purpose interpreter (we
implemented COM and .NET versions) to parse the embedded XML-assembly
instructions and render the returned data accordingly as XML.
Finally, we developed two general-purpose invocation
utilities, one for the command line and one web-based (implemented in .NET and ASP.NET,
respectively). These tools provide simple mechanisms to invoke parameterized stored
procedures, process the results using the interpreter, and output the resulting
XML, either raw or transformed.
The benefits of this approach, which we call “data shapes”
or simply “shapes”, include:
a. Simplicity.
With the grammar defined and the interpreter written, developers simply write
conforming stored procedures to create arbitrarily complex XML documents.
Unlike some alternatives, complex documents do not require complex procedures –
just select the data to assemble and accompany it with simple text instructions
like “RELATE Employee.CompanyID to Company.CompanyID” (more on the grammar
later).
b. Efficiency.
Returning all of the data for a document (sometimes dozens of recordsets) from
a stored procedure keeps database trips to an absolute minimum. The technique
also delegates the assembly of the XML from the database server to the
interpreter, which can be distributed across middle- or presentation-layer
machines. Writing XML involves conversion and formatting of native data types
as text, character escaping, and filtering of child recordsets, all of which
consume valuable CPU cycles.
c. Performance
and Control. Writing shapes as stored procedures gives the developer all
the benefits of that language: pre-compiled execution plans, caching, parameterization,
conditional logic, temporary storage, permission controls, nested calls, etc.
d. Familiarity.
While the grammar of shapes is no doubt novel, the tools used to create, edit,
test and analyze them are not. Shapes can be developed in Query Analyzer and
Visual Studio or any other T-SQL tool. Their performance can be evaluated
using SQL Profiler, and dissected by examining execution plans. Because a
shape encapsulates both data selection and assembly instruction in a single
scriptable procedure, it can be easily migrated between development, test and
production environments.
This article provides a complete definition of the shape
grammar as well as functional explanations of the shape interpreter and shape
invoker utilities. The accompanying download includes the binary and
configuration files necessary to experiment with shapes on your own.
Installation instructions are listed at the end of the article.
You can also see shapes in action at http://tech.rssgroup.com/shapes.
Let’s begin with some examples.
Example 1 – Hello, World!
Tradition dictates that all new programming technologies
introduce themselves, so here is a very simple shape listing:
create procedure dbo.shpHello
as
select 'dataset root.Hello' _shp
select 'Hello, World!' Msg
go
which can be invoked through ASP.NET, using the URL:
And here are the results it produces:
<?xml version="1.0" encoding="utf-8"?>
<Hello Msg="Hello, World!"/>
While trivial, there are a couple of things to note. The
shape instruction ('dataset root.Hello') tells the interpreter to name the following result set “root” and to represent each record within it as a <Hello> element. By default, the interpreter maps the single column of the data set (Msg) to an attribute of the same name in the <Hello> element.
Example 2 – A Realistic Report
Using Microsoft’s standard Northwind database, we will create a shape to return all of the orders created by a specified employee during a specified date range. The data will be organized by customer.
The following URL invokes the stored procedure dbo.shpEmployeeActivity with three parameters: employeeID, startDate and endDate.
Here’s the listing of the stored procedure. The embedded instruction statements are again highlighted in blue. The procedure uses a table variable to store orderIDs.
create procedure dbo.shpEmployeeActivity (
@employeeID int
, @startDate smalldatetime
, @endDate smalldatetime
as
set nocount on
-- create a table variable to hold the set of filtered orderIDs
declare @nOrders int
declare @tblOrders table (
OrderID int primary key)
insert @tblOrders
select orderID
from orders
where employeeID = @employeeID
and orderDate between @startDate and @endDate
set @nOrders = @@rowcount
set nocount off
select 'dataset root.Employee' _shp
select EmployeeID, LastName, FirstName, Title, ...
from Employees
where employeeID=@employeeID
if @nOrders>0
begin
select 'dataset Customers.Customer relate to root' _shp
select *
from customers
where customerID in
(select customerID from orders where orderID in
(select orderID from @tblOrders))
select 'dataset Orders.Order relate customerID to Customers.customerID' _shp
select *
from orders
where orderID in (select orderID from @tblOrders)
select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp
select *
from [Order Details]
where orderid in (select orderID from @tblOrders)
end
go
And here are the results (wider date ranges produce a lot more data):
<?xml version="1.0" encoding="utf-8"?>
<Employee EmployeeID="6" LastName="Suyama" FirstName="Michael"...>
<Notes><![CDATA[Michael is a graduate of...]]></Notes>
<Customer CustomerID="GOURL" CompanyName="Gourmet Lanchonetes"...>
<Order OrderID="10423" EmployeeID="6" OrderDate="1/23/1997"...>
<Item ProductID="31" UnitPrice="10.0000" Quantity="14" Discount="0" />
<Item ProductID="59" UnitPrice="44.0000" Quantity="20" Discount="0" />
</Order>
</Customer>
<Customer CustomerID="LAMAI" CompanyName="La maison d'Asie"...>
<Order OrderID="10425" EmployeeID="6" OrderDate="1/24/1997"...>
<Item ProductID="55" UnitPrice="19.2000" Quantity="10" Discount="0.25" />
<Item ProductID="76" UnitPrice="14.4000" Quantity="20" Discount="0.25" />
</Order>
</Customer>
</Employee>
As in the prior example, records were mapped to elements and columns were mapped to attributes of the same name, except for the SQL text-typed column “Notes”, which was mapped to a CDATA block within a child element in order to preserve whitespace and minimize character escaping.
Let’s take a closer look at the embedded instructions.
select 'dataset root.Employee' _shp
select ... from employees where ...
select 'dataset Customers.Customer relate to root' _shp
select * from customers where ...
select 'dataset Orders.Order relate customerID to Customers.customerID' _shp
select * from orders where ...
select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp
select * from [Order Details] where ...
Notice how the selection of each data set (SELECT [fields] FROM [table])
is preceded by an instruction (SELECT
‘dataset...’ _shp). These instructions tell the shape interpreter what
is coming next and where to put it.
Here’s what the instructions mean:
select
'dataset root.Employee' _shp
Name the following result set “root” and represent each
record within it as an <Employee> element.
select
'dataset Customers.Customer relate to root' _shp
Name the following result set “Customers” and represent each
record within it a <Customer> element. Make the <Customer>
elements appear as child elements of the <Employee> elements produced by
the “root” result set. In this case, there is only one <Employee>
element because we are selecting by primary key. If there were more than one,
the <Customer> elements would be repeated as child elements of each.
select
'dataset Orders.Order relate customerID to Customers.customerID' _shp
Name the following result set
“Orders” and represent each record within it as an <Order> element. Make
the <Order> elements appear as child elements of the <Customer>
elements produced by the “Customers” result set. The <Order> elements
will be filtered such that only those orders with matching customerID fields
appear as children of each <Customer> element. When rendering to XML,
the shape interpreter will not render the customerID attribute of the
<Order> element, since it is implicit in the XML structure.
select
'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp
Name the following result set “OrderDetails” and represent each
record within it as an <Item> element. Make the <Item> elements
appear as child elements of the <Order> elements produced by the “Orders”
result set, grouped by orderID.
Note that any Order records that do not match a selected
Customer record will not appear in the XML. Nor will any OrderDetails records
appear that do not match a selected Order. In other words, if the where
clauses (in orange) were omitted, the XML produced would be identical, but the
amount of data loaded by SQL Server and passed to the interpreter would
increase dramatically.
Because this error is not detectable in the output, except
through degraded performance, it can be difficult to diagnose. It is always a
good idea to test your shapes in Query Analyzer to verify that the data sets
are appropriately filtered. It is also possible that a future version of the
interpreter could detect this condition and produce a warning.
While this shape is intended to be representative of actual,
useful shapes, it does not fully exercise the shape grammar, which provides
fine controls over the mapping and formatting of records and columns. Simple
options are available to:
a. wrap
record elements in a grouping element. For example, <Customer> elements
could be wrapped by a <Customers> element.
b. render
all columns as child elements
c. render
an individual column as an attribute, child element, free text, CDATA block or
nested XML
d. format
numeric and date columns using Excel-like formatting strings
e. omit
a column from the XML output
To get a better sense of the possibilities, let’s examine
the full grammar.
Shape Grammar
Syntax
DATASET datasetName.elementName[
WITH options][ RELATE clause[ RELATE clause...]]
RELATE clause:
RELATE [TO parentDataset
| childField1[,childField2...]
TO parentDataset.parentField1[,parentField2]]
[ WITH options]
options:
option[
option...]
option:
setOption[,setOption...]|[fieldName:fieldOption[,fieldOption]
setOption:
elements|attributes|showkeys|group[(groupName)]|xmlns(prefix=URI)
fieldOption:
fieldRepresentation|format("formatString")|omit
fieldRepresentation:
attrib|element|text|cdata|raw[(indent)]
Arguments
datasetName
Specifies
a name for the subsequent result set that must be unique within all result sets
returned by the stored procedure. If a grouping option is specified, this name
is the default element name for the grouping element.
elementName
Specifies
a name for the elements generated for each record in the subsequent result set.
RELATE TO parentDataset
Indicates
that the elements generated for each record in the subsequent result set should
be nested beneath each element generated from the result set name parentDataset.
If more than one element is produced from the parent result set, all child
elements will be repeated beneath each parent element.
RELATE childField1[,childField2...]
TO parentDataset.parentField1[,parentField2]]
Indicates
that the elements generated for each record in the subsequent result set should
be nested beneath the elements generated from the result set name parentDataset,
and filtered such that only those child records where the values of the
specified child fields match the values of the specified parent fields will
produce child elements beneath a given parent element. If a child record does
not match any parent record, it will not produce an XML element.
WITH options
Specifies
options that affect the appearance of the elements generated for each record in
the subsequent result set. Options set on the dataset are overridden by
options set in a RELATE clause. While it is unusual to relate a result set to
multiple parents, it is possible, and in such cases each RELATE clause may have
separate and distinct options.
setOption:
attribs
Indicates
that fields should produce XML attributes (this is the default)
elements
Indicates
that fields should produce XML elements, rather than XML attributes by default
showkeys
Indicates
that child fields related to parent fields should not be omitted from the child
element as they are by default.
group[(groupName)]
Indicates
that child elements should be wrapped within a grouping element and optionally
specifies the name of that grouping element. The default group element name is
the datasetName.
xmlns(namespace)
Specifies
and XML namespace to appear in each child element.
fieldOption:
fieldRepresentation
Specifies
the representation of a field within the resulting XML. The default
representation for all fields other than text fields is as an XML attribute.
The default representation for text fields is as a CDATA block within a child
element.
attrib
represent
the field as an XML attribute of the record element:
<elementName ... fieldName=”value” …/>
value will be appropriately escaped for attribute representation
element
represent
the field as a child element of the record element:
<elementName ...>
…
<fieldName>value</fieldName>
…
</elementName>
value will be appropriately escaped for text representation
The shape interpreter analyzes the multiple result sets
returned from the stored procedure, parses the instructions according to the
grammar defined above, and renders the data accordingly as XML. The implementation
of the interpreter may interest developers wanting to customize or simply
understand it, but it is not a requirement for its use. In fact, its function
is thoroughly wrapped by the URL- and command-line invocation methods described
in the next section, so feel free to skip ahead.
Our .NET implementation of the interpreter (included in the
download) is encapsulated in a DataShape class derived from
System.Data.DataSet. The DataShape class is typically employed in a two-step
operation. First, the data and instructions are read from the data source (a
SqlCommand or SqlDataReader), populating the DataSet’s internal DataTable and
DataRelation collections. Then the DataShape is rendered as XML by passing a
suitable output (filename, IO.Stream, TextWriter or XmlWriter) to its WriteShapeXml()
method.
After writing, the DataShape is typically disposed of,
having served its purpose. It is, however, fully serializable and can be
written to and restored from a file (or other destination) if its contents are
still of interest. For instance, it is fairly straightforward to implement a
paging system in this manner, without requiring additional trips to the
database.
The DataShape class can also be populated manually or with
multiple calls to various stored procedures because it exposes and employs the
underlying capabilities of the standard DataSet class.
Having all of information necessary to create an XML
document contained in the results of our shape procedures, along with a
general-purpose interpreter for converting those results to XML, all that is
needed is a mechanism to invoke the store procedure, process the results using
the interpreter, and output the results. We created two: one for command line
invocations and one for browsers.
Let’s begin with the web-based tool, which we created using
ASP.NET. Essentially, we wanted a concise URL that identifies a stored
procedure and allows for the specification of parameter values and a few
options. The two examples above include simple web-based invocation URLs, with
and without parameters:
The first thing to note is that there is actually no .aspx file necessary to process these requests. By adding an entry to the httpHandlers section of the web.config file, all .aspx requests are routed to the ShapeHandler class of the RSSWebShapes assembly, which knows how to interpret the URL.
If shapes need to co-exist with standard .aspx pages, it is a simple matter to register a different extension for shapes and substitute that extension for .aspx in shape URLs. Typically, we use .shp. Changing the extension does require custom configuration of the virtual directory in IIS.
The ShapeHandler class creates a SqlCommand object for executing a stored procedure named shapeName. Using a database connection defined in the web.config, the ShapeHandler retrieves the parameter list from the database and populates any command parameters that were provided in the URL.
When passing parameters, note that parameter values should be URL escaped and that parameter names should omit the @ prefix.
The command line invoker, makeShape.exe, supports
essentially the same options with the exception of the contentType. Output is
written to the console unless an output file is specified. The general format
of the command line is:
Command Line Invoker Options
x[=xsltFilePath]
Specifies the path of an XSLT file that is used to transform the XML. If an extension is omitted, the default is .xslt. If the entire name is omitted, a file name of shpName.xslt is used.
r[=rootName]
Specifies an element name in which to wrap the output of the shape procedure. If the option is specified, but no rootName is provided, the default name is “root”.
c=indentChar
Specifies a character to use when indenting XML child elements. The default is tab.
i=indentLevel
Specifies the number of indentChars to use when indenting XML child elements. The default is 1. A value of 0 (zero) turns off all XML formatting, including newlines.
o[=outputPath]
Specifies the path of the file to which output should be written rather than the console. If no extension is specified, the default is .xml. If no outputPath is specified, the default is shpName.xml.
Shape Testing
The accompanying distribution includes a general-purpose web page (default.htm) for testing stored procedures in general, and shapes in particular. The page provides three panes: the list pane, the form pane and the results pane. The list pane shows all of the stored procedures defined in a configured database that match a search criterion, such as “shp%”.
The form pane presents text boxes for each parameter of a stored procedure selected from the list pane as well as a button to execute it. The form displays default values and data types for each parameter and a checkbox to specify use of the default. The form pane also allows for the specification of invoker options, such as an XSLT transform or output file name.
The results pane shows the shape results, typically as XML.
Use of the shape testing web page requires the creation of one stored procedure and one user-defined function within the configured database. See the installation instructions for details.
Conclusion
While there are several documented methods for creating XML from relational data, none offers the combination of simplicity, performance and control that encourages widespread adoption.
With shapes, a developer can use familiar tools and language (T-SQL) to create a single, intuitive stored procedure that produces a potentially complex, precisely controlled XML document that is URL- and ommand-line-addressable. Moreover, the data for the document are gathered with optimal efficiency in a single trip to the database, and assembled into XML by client processes that can be distributed across intermediate servers.
To see shapes in action without performing any installation, visit http://tech.rssgroup.com/shapes.
Currently, the tools are .NET-based and SQL Server centric. The concept and the grammar, however, are thoroughly portable and could support different databases (notably Oracle) and different middleware (e.g. java). Given sufficient interest, I will establish a project on SourceForge.NET to manage future development.
Installation
To experiment with shapes, I recommend using Microsoft’s standard Northwind database (although if you have a test database with existing stored procedures that can be more interesting).
1. The first step is to create a directory to host a simple website, let’s say:
c:\inetpub\wwwroot\shapes
F
2. Unzip the included files to that directory:
default.htm - presents a frameset used to test shapes
web.config - configures database access
bin\RSSWebShapes.dll - the shape interpreter
_xslt\shpList.xslt - used by default.htm
3. Edit the properties of the website using Internet
Services Manager:
a. create a new application for the site if it is using the Default Application
b. turn off anonymous access – this means the web user’s credentials will be used to establish database connections.
4. Edit the database connection string in web.config:
<DatabaseConnectionSets>
<local core="server=localhost;database=Northwind;trusted_connection=yes"
/>
</DatabaseConnectionSets>
5. Use Query Analyzer to install shpList and some sample shapes.
_sql\shpList.sql - T-SQL proc used by default.htm
_sql\shpSamples.sql - T-SQL procs for experimentation
6. Open http://localhost/shapes/
This site is a generic site for listing and executing stored procedures. It is especially useful for testing shapes.
Code Download