Stairway to XML

Stairway to XML: Level 3 - Working with Typed XML

,

In Level 2 of the Stairway to XML series, you were introduced to the XML data type and shown how to use it to create columns, variables, and parameters that can store XML data. As you saw, you simply specify the XML data type when you define the object, as you would any other type. However, the examples in that level told only part of the story. SQL Server actually supports two kinds of XML objects: typed and untyped.

What distinguishes the two is whether the XML column, variable, or parameter is associated with a specific schema collection, a database entity (like a table or stored procedure) that specifies the structure and data types that an XML document must adhere to. If a database object is associated with a collection, it is considered typed, otherwise it is untyped. The XML columns, variables, and parameters defined in the Level 2 examples are all untyped XML objects because no schema collections are associated with them. In this level, you’ll learn how to work with schema collections and how to create typed database objects.

The XML Schema Collection

An XML schema collection is made up of one or more XML Schema Definition (XSD) schemas that are used to validate XML data stored in a typed XML object. The XSD schemas contain the actual formatting information that defines the structure and data types an XML instance must use when saved to a typed XML object. As a result, before you can associate an XML schema collection with an XML object, the collection must exist as an entity within the database. In other words, you must specifically create the collection before you can reference it.

To create a schema collection, you use the CREATE XML SCHEMA COLLECTION statement to specify a collection name and define at least one XSD schema. In the T-SQL code shown in Listing 1, I create the ClientDB database and then add the ClientInfoCollection XML schema collection to the database. The collection includes a single XSD schema (enclosed in single quotes after the AS keyword).

USE master
GO
IF DB_ID('ClientDB') IS NOT NULL
DROP DATABASE ClientDB
GO
CREATE DATABASE ClientDB
GO
USE ClientDB
GO
CREATE XML SCHEMA COLLECTION ClientInfoCollection AS 
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns="urn:ClientInfoNamespace" 
targetNamespace="urn:ClientInfoNamespace" 
elementFormDefault="qualified">
  <xsd:element name="People">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Person" minOccurs="1" maxOccurs="unbounded">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="FirstName" type="xsd:string" minOccurs="1" maxOccurs="1" />
              <xsd:element name="LastName" type="xsd:string" minOccurs="1" maxOccurs="1" />
              <xsd:element name="FavoriteBook" type="xsd:string" minOccurs="0" maxOccurs="5" />
            </xsd:sequence>
            <xsd:attribute name="id" type="xsd:integer" use="required"/>
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>'
GO

Listing 1: Creating an XML schema collection in the ClientDB database

Before I try to explain what’s going on here, let me qualify my remarks. The rules governing XSD schemas are quite complex. I can provide an overview of what I’ve done here, but a full explanation of the XSD syntax is beyond the scope of this level—or even this series. However, you can find detailed information about the various schema elements at w3schools.com or preferably read Jacob Sebastian's book 'The Art of XSD'

In the meantime, let’s look at the CREATE XML SCHEMA COLLECTION statement in more detail. As you can see in Listing 1, I provide a name for the collection (ClientInfoCollection) after the COLLECTION keyword.

I follow the name with the AS keyword and an XSD schema definition, enclosed in single quotes. The definition begins with the <xsd:schema> element, which provides the details necessary to establish this code as an XSD schema. In this case, I’ve included four attributes in the <xsd:schema> element, which are defined in Table 1.

AttributeDescription
xmlns (first instance)Specifies the source namespace (http://www.w3.org/2001/XMLSchema) for the elements and data types used in the schema. The :xsd that follows the xmlns attribute name is an alias used throughout the schema to reference the source namespace.
xmlns (second instance)Specifies the schema’s default namespace (urn:ClientInfoNamespace). This is the namespace that XML instances must reference when stored in a typed XML object. The urn: that precedes the actual namespace name refers to Uniform Resource Name and is used to denote this as a location-independent resource identifier, as opposed to a Uniform Resource Locator (URL), such as the one specified in the first instance of the xmlns attribute.
targetNamespaceSpecifies the namespace (urn:ClientInfoNamespace) associated with the elements defined in the schema. This namespace corresponds to the one specified in the second instance of the xmlns attribute.
elementFormDefaultSpecifies that all elements in the XML instance associated with the schema must be qualified with the default namespace, either explicitly or implicitly.

Table 1: The attributes defined in the <xsd:schema> element

After I specify the opening <xsd:schema> tag, I define the actual structure that will be applied to the XML instances stored in the typed XML objects. I begin by using an <xsd:element> element to define the XML document root, which is called People, as determined by the name attribute. Notice that both the <xsd:schema> element and the <xsd:element> element are preceded by the xsd: alias, which refers to the first namespace defined in <xsd:schema>. All element definitions in the schema must include the xsd: preface.

Much of the remaining XSD structure is devoted to delineating the child elements and attributes that make up the root element. In fact, you can find the root’s closing tag, </xsd:element>, on the second-to-last line of the XSD definition. Everything between and including the opening and closing tags define the structure that each XML instance in an XML object must conform to.

The next element in the XSD schema is <xsd:complexType>, which indicates that the current element can contain other elements or attributes. When you include the <xsd:complexType> element in an element that will include child elements, you must also include an indicator element that determines the order of the child elements and, in some cases, the number of those elements. In this situation, I’m using the <xsd:sequence> indicator element to specify that child elements must appear in a specific order. Although order isn’t a factor in this particularly situation, the <xsd:sequence> indicator is the most appropriate option of the available indicators.

After the <xsd:sequence> element, I add a child element named Person, as indicated by the name attribute. Notice that the element also includes the minOccurs attribute, which indicates that the XML instance must include at least one <Person> element, and the maxOccurs attribute, which has a value of unbounded, indicating that there is no limit to the number of <Person> elements that the XML instance can contain.

Because the <Person> element can itself include child elements, it too is followed by the <xsd:complexType> element and the <xsd:sequence> indicator element. The first of the child elements is called FirstName. The type attribute, which indicates the data type of the element value, is set to xsd:string. The minOccurs attribute indicates that at least one instance of the element is required, and the maxOccurs attribute indicates that no more than one instance of the element is permitted.

The second of the child elements is named LastName, and it is configured like the <FirstName> element. Note, however, that because the <xsd:sequence> element is included, the XML instance must specify these elements in their defined order.

The last of the <Person> child elements is the one named FavoriteBook. Notice that this element requires no minimum instances but limits the number of maximum instances to five.

There’s one more item in the <Person> element worth noting—an attribute named id. The attribute is defined with the xsd:int data type and is required with each <Person> element in the XML instance, as indicated by the use attribute, which is set to required.

That about covers this schema. As you can see, it’s structured like an XML document, with <xsd:schema> as the root element. Again, I’ve provided only a high-level overview of what makes up an XSD schema. And the example I’ve used is a very simply one. If you want to see more complex schemas, check out the AdventureWorks sample database. It includes several examples. Now let’s look at what you can do with a schema.

Typed XML

Once you’ve added your schema collection to the database, you can create typed XML columns, variables, and parameters. (You can also breathe a sigh or relief knowing that the hardest part is behind you.)

When defining a typed XML object, you specify the XML data type as you do for untyped objects, but you also add the name of the schema collection, enclosed in parentheses. For example, in Listing 2, I declare the @xml variable, assign the XML data type to the variable, and specify the ClientInfoCollection schema collection (created in Listing 1).

DECLARE @xml XML(ClientInfoCollection)
SET @xml = 
  '<?xml version="1.0" encoding="UTF-8"?>
  <!-- A list of current clients -->
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="1234">
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
    </Person>
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
      <FavoriteBook>Crime &amp; Punishment</FavoriteBook>
    </Person>
  </People>'
SELECT @xml

Listing 2: Defining a typed XML variable based on the ClientInfoCollection schema collection

As you can see, to create a typed XML object, you need only add the schema collection name. Where things can get tricky is when you insert data into that object. Once you’ve typed an XML object, your XML instance must conform to the format defined by the XSD schema included in the specified collection. In addition, you must ensure that your XML instance references the default namespace specified in the schema.

If you refer back to Listing 2, you’ll see that I include the xmlns attribute in the <People> element, the XML document’s root element. The xmlns attribute refers specifically to the default namespace defined in the XSD schema. Now any elements I include in my XML instance will apply to that namespace.

Because I’ve created a typed variable, I must also ensure that my XML document conforms to the format defined in the XSD schema. That means, for example, my <People> element must contain at least one instance of the <Person> child element, and each instance of the <Person> element must contain one instance of the <FirstName> element, one instance of the <LastName> element, and zero to five instances of the <FavoriteBook> element. In addition, each <Person> element must include an id attribute.

When I executed the statements in Listing 2, I was able to successfully assign my XML instance to the @xml variable, which means that the XML instance had been properly formatted, as evidenced by the value returned by the SELECT statement (shown in Listing 3).

<!-- A list of current clients -->
<People xmlns="urn:ClientInfoNamespace">
  <Person id="1234">
    <FirstName>John</FirstName>
    <LastName>Doe</LastName>
  </Person>
  <Person id="5678">
    <FirstName>Jane</FirstName>
    <LastName>Doe</LastName>
    <FavoriteBook>Crime &amp; Punishment</FavoriteBook>
  </Person>
</People>

Listing 3: The XML instance stored in the @xml variable

As you would expect, the XML instance is returned in its entirety, including the xmlns attribute and its reference to the XSD default namespace.

A typed XML object ensures that your XML instance is correctly formatted. If it’s not, SQL Server will return an error. For example, the code shown in Listing 4 declares a typed XML variable, but this time tries to assign an improperly formatted XML document to the variable.

DECLARE @xml XML(ClientInfoCollection)
SET @xml = 
  '<?xml version="1.0" encoding="UTF-8"?>
  <!-- A list of current clients -->
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="1234">
      <FirstName>John</FirstName>
      <MiddleInit>T</MiddleInit>
      <LastName>Doe</LastName>
    </Person>
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
      <FavoriteBook>Crime &amp; Punishment</FavoriteBook>
    </Person>
  </People>'

Listing 4: Assigning an improperly formatted XML instance to a typed XML variable

In this case, the first instance of the <Person> element includes the <MiddleInit> child element, which is not specified in the XSD schema. As a result, when I try to run the SET statement, SQL Server returns the error shown in Listing 5.

Msg 6965, Level 16, State 1, Line 2
XML Validation: Invalid content. Expected element(s): '{urn:ClientInfoNamespace}LastName'. Found: element '{urn:ClientInfoNamespace}MiddleInit' instead. Location: /*:People[1]/*:Person[1]/*:MiddleInit[1].

Listing 5: Error returned as a result of an improper element

As the error indicates, the XML parser expected the <LastName> element to follow the <FirstName> element, but instead found <MiddleInit>, which of course doesn’t belong there.

But non-defined elements are not the only problem. Because the XSD schema defines the <Person> element as a complex type and qualifies the type with the <xsd:sequence> element, introducing the child elements in the wrong order also causes the parser to generate an error. For example, in Listing 6, I switch the <FirstName> and <LastName> elements in the first instance of the <Person> element.

DECLARE @xml XML(ClientInfoCollection)
SET @xml = 
  '<?xml version="1.0" encoding="UTF-8"?>
  <!-- A list of current clients -->
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="1234">
      <LastName>Doe</LastName>
      <FirstName>John</FirstName>
    </Person>
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
      <FavoriteBook>Crime &amp; Punishment</FavoriteBook>
    </Person>
  </People>'
SELECT @xml

Listing 6: Listing elements in the wrong order in an XML instance

Again, the XML parser chokes when it doesn’t get what it expects. When the order is changed, SQL Server returns the error shown in Listing 7. The XML parser expects the <FirstName> element but instead gets the <LastName> element, so an error is returned.

Msg 6965, Level 16, State 1, Line 2
XML Validation: Invalid content. Expected element(s): '{urn:ClientInfoNamespace}FirstName'. Found: element '{urn:ClientInfoNamespace}LastName' instead. Location: /*:People[1]/*:Person[1]/*:LastName[1].

Listing 7: Error returned as a result of specifying elements in an incorrect order

Although storing data as typed XML might seem like more trouble than it’s worth, it does in fact help to optimize queries, data storage, and data modification. So if you have an XSD schema available, it’s worth adding it to a collection and implementing typed XML objects.

XML Documents and Fragments

There’s one other aspect of working with typed XML that’s worth mentioning. SQL Server distinguishes between XML documents and XML fragments. A document is an XML instance that has one root element, as you saw in the examples above. An XML fragment does not have this restriction. The XML must still be well formed, but it can have multiple root elements.

By default, SQL Server lets you store both documents and fragments in an XML object. However, if the object is typed, you can specify that it store only documents.

The XML data type provides two options that let you define how to store fragments and documents. When you specify the name of the schema collection, you can also specify either the CONTENT or DOCUMENT option. CONTENT is the default, and therefore permits either fragments or documents. If you want, you can include CONTENT when defining your XML object. For example, in Listing 8, I added the CONTENT keyword when declaring my variable.

DECLARE @xml XML(CONTENT ClientInfoCollection)
SET @xml = 
  '<?xml version="1.0" encoding="UTF-8"?>
  <!-- A list of current clients -->
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="1234">
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
    </Person>
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
      <FavoriteBook>Crime &amp; Punishment</FavoriteBook>
    </Person>
  </People>'
SELECT @xml

Listing 8: Accepting XML documents or fragments in an XML object

Notice that the addition of the CONTENT keyword is the only thing that changed. The XML instance will be saved to the variable as in previous examples, and the SELECT statement will return the full XML instance, as expected. In this case, that XML is considered a document because it has only one root element. However, I could have just as easily saved a well-formed fragment to the variable.

If I want to limit my XML instances to documents only, I can instead specify the DOCUMENT option, as shown in Listing 9.

DECLARE @xml XML(DOCUMENT ClientInfoCollection)
SET @xml = 
  '<?xml version="1.0" encoding="UTF-8"?>
  <!-- A list of current clients -->
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="1234">
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
    </Person>
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
      <FavoriteBook>Crime &amp; Punishment</FavoriteBook>
    </Person>
  </People>'
SELECT @xml

Listing 9: Limiting an XML object to XML documents

Again, making this change presents no problems because the XML instance is a well-formed document. But suppose I try to pass in an XML fragment when I specify the DOCUMENT option, as I do in Listing 10.

DECLARE @xml XML(DOCUMENT ClientInfoCollection)
SET @xml = 
  '<?xml version="1.0" encoding="UTF-8"?>
  <!-- A list of current clients -->
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="1234">
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
    </Person>
  </People>
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
      <FavoriteBook>Crime &amp; Punishment</FavoriteBook>
    </Person>
  </People>'
SELECT @xml

Listing 10: Passing an XML fragment into an XML object configured for documents

Notice that my XML instance is now a fragment that includes two instances of the <People> element at the root. Not surprisingly, the XML parser doesn’t like such behavior. When I run the SET statement, SQL Server returns the error shown in Listing 11.

Msg 6901, Level 16, State 1, Line 2
XML Validation: XML instance must be a document.

Listing 11: Error generated by SQL Server as a result of the XML fragment

Of course, this is easily fixed by changing the DOCUMENT option to CONTENT, as I do in Listing 12. I still pass in an XML fragment, but now the XML parser happily accepts it

DECLARE @xml XML(CONTENT ClientInfoCollection)
SET @xml = 
  '<?xml version="1.0" encoding="UTF-8"?>
  <!-- A list of current clients -->
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="1234">
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
    </Person>
  </People>
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
      <FavoriteBook>Crime &amp; Punishment</FavoriteBook>
    </Person>
  </People>'
SELECT @xml

Listing 12: Passing an XML fragment into an XML object configured for fragments or documents

As expected, the SELECT statement returns the XML fragment in its entirety. I could have just as easily added more instances of the <People> element and the XML parser would have accepted it, as long as the fragment is still well formed and adheres to the formatting structure defined in the XSD schema.

Conclusion

Working with typed XML objects—and their associated schema collections—gets to be a bit more complicated than working with simple untyped XML objects. If you plan to save data as typed XML, you should have a basic understanding of how to implement and work with XSD schemas. However, creating the XML objects themselves is still a very basic process, whether those objects are typed or untyped. In the next level, you’ll learn how to incorporate XML objects in other database entities, such as views, functions, and computed columns. Until then, you might want to dig deeper into XSD schemas. There’s a lot to learn there, and the better you understand them, the more effectively you’ll be able to store typed XML.

This article is part of the parent stairway Stairway to XML

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating