Stairway to XML

Stairway to XML: Level 4 - Querying XML Data

,

In the last two Levels of the Stairway to XML series, you learned how to use the XML data type when defining columns, variables, and parameters. You also learned the difference between typed and untyped XML objects and how you can associate typed objects with an XML schema collection. As you have seen, the XML data type makes it relatively easy to store XML documents and fragments.

However, in those Levels, we merely scratched the surface when it comes to understanding the full power of the XML data type. It turns out that the type supports five methods—query(), value(), exist(), nodes(), and modify()—that let you query and manipulate the elements and attributes within the instances stored in the XML objects. In this Level, we’ll cover the two most common methods used to query XML data: query() and value(). In the Levels that follow, we’ll cover the others.

Each XML method requires, at a minimum, one argument that is an XQuery expression. XQuery is a powerful scripting language used to access XML data. The language contains the functions, operators, variables, values, and other elements necessary to create complex expressions. SQL Server supports a subset of the XQuery language that you use to create the expressions you pass into the XML methods. With these expressions, you can identify very specifically the components in the XML instances you want to retrieve or modify.

Note

Because XQuery is such a complex language, we can touch upon only some of its components in this Level. For a more thorough understanding of XQuery and how it’s implemented in SQL Server, see the MSDN XQuery language reference.

Now let’s get started with the query() and value() methods. In this Level, we look at a number of examples that use them to access XML data. The examples are based on a database and table I created on a local instance of SQL Server 2008 R2. The Transact-SQL in Listing 1 creates the test environment necessary to run these examples. The environment includes the ClientDB database, the ClientInfoCollection XML schema collection, and the ClientInfo table.

USE master;
GO
IF DB_ID('ClientDB') IS NOT NULL
DROP DATABASE ClientDB;
GO
CREATE DATABASE ClientDB;
GO
USE ClientDB;
GO
IF OBJECT_ID('ClientInfoCollection') IS NOT NULL
DROP XML SCHEMA COLLECTION ClientInfoCollection;
GO
CREATE XML SCHEMA COLLECTION ClientInfoCollection AS 
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns="urn:ClientInfoNamespace" 
targetNamespace="urn:ClientInfoNamespace" 
elementFormDefault="qualified">
  <xsd:element name="People">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Person" minOccurs="1" maxOccurs="unbounded">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="FirstName" type="xsd:string" minOccurs="1" maxOccurs="1" />
              <xsd:element name="LastName" type="xsd:string" minOccurs="1" maxOccurs="1" />
              <xsd:element name="FavoriteBook" type="xsd:string" minOccurs="0" maxOccurs="5" />
            </xsd:sequence>
            <xsd:attribute name="id" type="xsd:integer" use="required"/>
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>';
GO
IF OBJECT_ID('ClientInfo') IS NOT NULL
DROP TABLE ClientInfo;
GO
CREATE TABLE ClientInfo
(
  ClientID INT PRIMARY KEY IDENTITY,
  Info_untyped XML,
  Info_typed XML(ClientInfoCollection)
);
INSERT INTO ClientInfo (Info_untyped, Info_typed)
VALUES
(
  '<?xml version="1.0" encoding="UTF-8"?>
  <People>
    <Person id="1234">
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
    </Person>
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
    </Person>
  </People>',
  '<?xml version="1.0" encoding="UTF-8"?>
  <People xmlns="urn:ClientInfoNamespace">
    <Person id="1234">
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
    </Person>
    <Person id="5678">
      <FirstName>Jane</FirstName>
      <LastName>Doe</LastName>
    </Person>
  </People>'
);

Listing 1: Setting up the test environment for the examples in this Level

As Listing 1 shows, the ClientInfo table includes a typed XML column (Info_typed) and an untyped XML column (Info_untyped). The typed column is associated with the ClientInfoCollection XML schema collection, which includes only one XSD schema. The schema’s namespace is urn:ClientInfoNamespace. For details about working with XML schema collections and typed database objects, refer back to Level 2.

The XML query() Method

Perhaps the simplest of the XML methods to understand and use is the query() method. The method is most often used to return an instance of untyped XML that is a subset of the targeted XML data. To use the method, you specify the XML database object, the method itself, and the XQuery expression enclosed in parentheses and single quotes, as shown in the following syntax:

db_object.query('xquery_exp')

When calling the query() method, you replace the db_object placeholder with the name of the actual database object and replace the xquery_exp placeholder with the XQuery expression.

Let’s look at an example that demonstrates how this work. In the SELECT statement shown in Listing 2, I use the query() method to retrieve data from the Info_untyped column. I first specify the column name, a period, and the method name. Then, enclosed in parentheses and single quotes, I add the XQuery expression (/People).

SELECT Info_untyped.query('/People')
  AS People_untyped
FROM ClientInfo;

Listing 2: Using the query() method to return the <People> element

In this case, the XQuery expression is as about simple as it can get. I am essentially telling the query() method to return the <People> element and all its contents (the child elements, attributes, and their values). To do so, as you can see, I needed only to specify the word People, preceded by a forward slash. Listing 3 shows the results returned by the SELECT statement.

<People>
  <Person id="1234">
    <FirstName>John</FirstName>
    <LastName>Doe</LastName>
  </Person>
  <Person id="5678">
    <FirstName>Jane</FirstName>
    <LastName>Doe</LastName>
  </Person>
</People>

Listing 3: The results returned by the /People XQuery expression

If I want to instead retrieve the <People> element from the typed column, I need to modify XQuery expression to include a reference to the namespace specified in the XSD schema defined in the XML schema collection. Listing 4 shows how I modified the statement to retrieve data from the typed XML column.

SELECT Info_typed.query(
  'declare namespace ns="urn:ClientInfoNamespace";
  /ns:People') AS People_typed
FROM ClientInfo;

Listing 4: Using the query() method on a typed XML column

Notice that the query expression is broken into two sections, separated with a semicolon. The first section declares the namespace. It begins with the declare namespace keywords, followed by a namespace alias, in this case ns. The alias is followed with an equal sign and then the namespace itself (urn:ClientInfoNamespace), enclosed in double quotes.

The second section of the XQuery expression is similar to the expression used for an untyped column, except that the element name is first proceeded by the namespace alias and a colon (ns:). For typed columns, your element names must be fully qualified, that is, they must reflect the schema they’re associated with, and that is done by specifying the associated namespace. However, instead of typing the entire namespace for each element, you can instead use an alias, which is what I’ve done.

When you run the SELECT statement, it returns the results shown in Listing 5.

<People xmlns="urn:ClientInfoNamespace">
  <Person id="1234">
    <FirstName>John</FirstName>
    <LastName>Doe</LastName>
  </Person>
  <Person id="5678">
    <FirstName>Jane</FirstName>
    <LastName>Doe</LastName>
  </Person>
</People>

Listing 5: The results returned by an XQuery expression used for a typed column

As you can see, the results returned from the typed column are nearly identical to those returned by the untyped column, except for now the results include a reference to the namespace.

You may have noticed that the last two examples return what is essentially the entire XML instance saved to the each column. Although the examples are useful for demonstrating how the query() method works, using the method to return the entire instance is not very useful because you can do that without using the method at all. The key, of course, is to refine the XQuery expression to return more specific data.

Suppose for example, instead of retrieving the entire XML document, we instead want to return each instance of the <Person> element, along with its child elements and attributes. For the untyped column, we would need to modify our XQuery expression by adding /Person to our path name so that our expression reads /People/Person, as shown in Listing 6.

SELECT 
  Info_untyped.query(
    '/People/Person') AS People_untyped,
  Info_typed.query(
    'declare namespace ns="urn:ClientInfoNamespace";
    /ns:People/ns:Person') AS People_typed
FROM ClientInfo;

Listing 6: Retrieving the <Person> elements from the XML instance

What I’ve done here is to further qualify the path name within the expression by adding the second element. Now only the <Person> elements are returned and not the <People> element. Listing 7 shows the results returned for the untyped column. Notice that only the two instances of the <Person> element have been returned.

<Person id="1234">
  <FirstName>John</FirstName>
  <LastName>Doe</LastName>
</Person>
<Person id="5678">
  <FirstName>Jane</FirstName>
  <LastName>Doe</LastName>
</Person>

Listing 7: The <Person> elements returned from the untyped column

If you refer back to Listing 6, you’ll see that for the typed column, the expression includes the ns: alias prefix before each element within the path name. Listing 8 shows the results returned for that column. This time, the namespace reference is included with each instance of the <Person> element.

<ns:Person xmlns:ns="urn:ClientInfoNamespace" id="1234">
  <ns:FirstName>John</ns:FirstName>
  <ns:LastName>Doe</ns:LastName>
</ns:Person>
<ns:Person xmlns:ns="urn:ClientInfoNamespace" id="5678">
  <ns:FirstName>Jane</ns:FirstName>
  <ns:LastName>Doe</ns:LastName>
</ns:Person>

Listing 8: The <Person> elements returned from the typed column

Now suppose we want to return a specific instance of the <Person> element. One way we can do this is to further qualify the XQuery expression by adding a reference to the id attribute and a specific attribute value. In the example shown in Listing 9, I’ve added a reference to the id attribute for both the typed and untyped columns.

SELECT 
  Info_untyped.query(
    '/People/Person[@id=1234]') AS People_untyped,
  Info_typed.query(
    'declare namespace ns="urn:ClientInfoNamespace";
    /ns:People/ns:Person[@id=5678]') AS People_typed
FROM ClientInfo;

Listing 9: Retrieving data for a specific <Person> element

The first thing to notice is that I’ve enclosed the attribute reference in brackets. In addition, I preceded the attribute name with an at (@) sign and followed it with an equal and then provided the attribute value. For the untyped column, I used the value 1234. As a result, the XML returned from that column includes only the <Person> element whose id value equals 1234, as shown in Listing 10.

<Person id="1234">
  <FirstName>John</FirstName>
  <LastName>Doe</LastName>
</Person>

Listing 10: The <Person> element with an id value of 1234

For the typed column, I used the value 5678 in the attribute reference. Notice, however, that I do not need to include the namespace alias prefix along with the attribute name. Referencing the namespace in the element name is enough.

Note

When specifying a specific value in your XQuery expression, as I do for the id attribute value, string values should be enclosed in double quotes. However, the rules for numeric values are somewhat different. For untyped columns, you can also specify numeric values in double quotes or without the quotes, but when working with typed columns, you must conform to the schema, which in this case specifies the id attribute as an INT value. Consequently, you cannot enclose the value in quotes. If you do, you’ll receive an error when running your statement.

Not surprisingly, the XML returned from the typed column includes only the <Person> element whose id value equals 5678, as shown in Listing 11.

<ns:Person xmlns:ns="urn:ClientInfoNamespace" id="5678">
  <ns:FirstName>Jane</ns:FirstName>
  <ns:LastName>Doe</ns:LastName>
</ns:Person>

Listing 11: The <Person> element with an id value of 5678

We can even further refine our XQuery expression by adding another child element to the path name, in this case the <FirstName> element. For the untyped column, we simply add /FirstName to the expression, and for the typed column, we add /ns:FirstName. Listing 12 shows what our SELECT statement now looks like.

SELECT 
  Info_untyped.query(
    '/People/Person[@id=1234]/FirstName') AS People_untyped,
  Info_typed.query(
    'declare namespace ns="urn:ClientInfoNamespace";
    /ns:People/ns:Person[@id=5678]/ns:FirstName') AS People_typed
FROM ClientInfo;

Listing 12: Retrieving the <FirstName> element for a specific <Person> element

The XML returned from the untyped column now includes only the <FirstName> child element of the <Person> element whose id value is 1234, as shown in Listing 13.

<FirstName>John</FirstName>

Listing 13: The <FirstName> child element for the <Person> element with an id value of 1234

The XML returned from the typed column now includes only the <FirstName> child element of the <Person> element whose id value is 5678, as shown in Listing 14. Notice that, even at this level, the namespace is included in the returned value.

<ns:FirstName xmlns:ns="urn:ClientInfoNamespace">Jane</ns:FirstName>

Listing 14: The <FirstName> child element for the <Person> element with an id value of 5678

Another way you can reference a specific element within an XQuery expression is to specify the element’s position number, relative to other instances of that element. For example, our source data includes two instances of the <Person> element. The first instance of that element is implicitly assigned the number 1 and the second instance the number 2. In the SELECT statement in Listing 15, I use the numbers 1 and 2 to reference those instances.

SELECT 
  Info_untyped.query(
    '/People/Person[1]/FirstName') AS People_untyped,
  Info_typed.query(
    'declare namespace ns="urn:ClientInfoNamespace";
    /ns:People/ns:Person[2]/ns:FirstName') AS People_typed
FROM ClientInfo;

Listing 15: Using instance numbers to reference instances of the <Person> element

Notice that instead of specifying an attribute reference, I specify the value [1] for the untyped column and the value [2] for the typed column. That means, in the case of the untyped column, the [1] indicates that the first instance of the <Person> element should be returned, or rather, the <FirstName> element of that instance, and for the typed column, the [2] indicates that the second instance of the <Person> element should be returned. Although I’ve used the numerical references, the SELECT statement in Listing 15 returns the same results as the statement in Listing 12.

The XML value() Method

As handy as the query() method can be, there might be times that you want to retrieve a specific element or attribute value, rather than returning an XML element. That’s where the value() method comes in. The method not only retrieves a specific value, but does so as a specified data type. For this reason, when you call the value() method, you must pass in two arguments—the XQuery expression and the Transact-SQL data type—as shown in the following syntax:

db_object.value('xquery_exp', 'sql_type')

Notice that you call the value() method in much the same way you call the query() method. The only difference is the second argument, in which you specify the data type. For example, the SELECT statement shown in Listing 16 retrieves the <FirstName> value from the XML and returns it with the VARCHAR data type.

SELECT 
  Info_untyped.value(
    '(/People/Person[1]/FirstName)[1]', 
    'varchar(20)') AS Name_untyped,
  Info_typed.value(
    'declare namespace ns="urn:ClientInfoNamespace";
    (/ns:People/ns:Person[2]/ns:FirstName)[1]',
    'varchar(20)') AS Name_typed
FROM ClientInfo;

Listing 16: Retrieving the <FirstName> values from the XML instances

As the listing shows, you first specify the XQuery expression, followed by a comma, and then the data type. Like the XQuery expression, the data type must be enclosed in single quotes. That part should be fairly straightforward. What is not so straightforward is the XQuery expression itself. Although for both the typed and untyped columns the expressions are much the same as their counterparts in Listing 15, there is a significant difference. Each expression is enclosed in parentheses and following by [1]. The parentheses ensure that the expression is treated as a single unit to which the [1] can be applied.

The [1] means that the first instance of the returned instances is the instance that the expression should use. For example, suppose your XQuery expression returns multiple <Person> elements. Surrounding the expression with parentheses and adding the [1] indicates that the first instance of <Person> should be used. Note, however, that even if your expression returns only one instance, you must still include the [1] because a singleton value is required by the value()method, and the [1] ensures that only one value can be returned.

In Listing 16, I specify [1] after the XQuery expression for both the untyped and typed columns. And because I’ve used the value() method in both cases, the SELECT statement returns only the first names of the two people listed in the XML documents, as shown in Listing 17.

Name_untyped         Name_typed
-------------------- --------------------
John                 Jane

Listing 17: The <FirstName> values for the two <Person> elements

In some cases, you can eliminate the internal numerical identifier after a specific element and use only the outer one to identify the XML element. If you do this, however, you must make sure your outer reference identifies the correct instance. For example, in Listing 18, I removed the numerical references associated with the <Person> element and then modified the expression for the typed column by changing the final [1] to [2].

SELECT 
  Info_untyped.value(
    '(/People/Person/FirstName)[1]', 
    'varchar(20)') AS Name_untyped,
  Info_typed.value(
    'declare namespace ns="urn:ClientInfoNamespace";
    (/ns:People/ns:Person/ns:FirstName)[2]',
    'varchar(20)') AS Name_typed
FROM ClientInfo;

Listing 18: Retrieving the first and second instances of <FirstName>

Because there are two instances of the <FirstName> element, you can use the final numeric qualifier to distinguish which instance you want to return. The statement returns the same results as the statement in Listing 16.

You can also use the same strategy to return an attribute value. For example, the SELECT statement in Listing 19 retrieves the values of each instance of the id attribute and assigns the INT data type to the returned values.

SELECT 
  Info_untyped.value(
    '(/People/Person/@id)[1]', 
    'int') AS Name_untyped,
  Info_typed.value(
    'declare namespace ns="urn:ClientInfoNamespace";
    (/ns:People/ns:Person/@id)[2]',
    'int') AS Name_typed
FROM ClientInfo;

Listing 19: Retrieving the two instances of the id attributes

As you would expect, the SELECT statement returns only the attribute values, as shown in Listing 20.

Name_untyped         Name_typed
-------------------- --------------------
1234                 5678

Listing 20: The returned values for the two id attributes

In addition to defining path names in your XQuery expressions, you can incorporate XQuery functions that let you further refine your query and manipulate data. For instance, XQuery supports the count() function, which provides a count of the number of instances returned by an expression. In Listing 21, I use the count() function to return the number of <Person> elements in the XML document in each XML column.

SELECT 
  Info_untyped.value(
    'count(/People/Person)', 
    'int') AS Number_untyped,
  Info_typed.value(
    'declare namespace ns="urn:ClientInfoNamespace";
    count(/ns:People/ns:Person)',
    'int') AS Number_typed
FROM ClientInfo;

Listing 21: Using the count() function to retrieve the number of <Person> elements

For each XQuery expression, I specify the count() function, followed by the path name, which is enclosed in parenthesis. Because the count() function itself returns a singleton value, I do not have to tag the [1] onto the data path, even though I’m using the value() method. Listing 22 shows the results returned for each XML column. As you would expect, the value 2 is returned in both cases.

Number_untyped Number_typed
-------------- ------------
2              2

Listing 22: The number of <Person> elements in each XML column

Another example of an XQuery function is concat(), which lets you concatenate two or more values from an XML document. To use the function, you specify each segment that you want to concatenate as an argument to the function, as demonstrated in Listing 23.

SELECT 
  Info_untyped.value(
    'concat((/People/Person/FirstName)[2], " ", 
      (/People/Person/LastName)[2])', 
    'varchar(25)') AS FullName
FROM ClientInfo;

Listing 23: Using the concat() function to concatenate values

In this case, I’m passing three arguments into the concat() function, which I enclose in parentheses and separate with commas. The first and third arguments are basic XQuery expressions that are themselves qualified with parentheses and a numerical tag to indicate which element instance to return, exactly the sort of expression you would expect to pass to the value() method. The second argument is merely a blank space, enclosed in double quotes. The space will be inserted between the first and last names. Listing 24 shows the results that the statement returns.

FullName
-------------------------
Jane Doe

Listing 24: Returning the full name from the second instance of the <Person> element

The first and last names have been concatenated into a single value. Both names come from the second instance of the <Person> element.

Conclusion

As you’ve seen in this Level, you can use the query() method to retrieve a subset of data from an XML instance, and you can use the value()method to retrieve individual element and attribute values from an XML instance. In the next Level, we’ll cover the exist() and nodes() methods. Although these methods are also used to query XML data, the results they return are not simple XML instances or values. In fact, the methods are often used in conjunction with the query() and value()methods because of the type of data they return.

In the Level that follows the next one, we’ll review the modify() method, the only XML method that lets you manipulate XML data. But keep in mind that, as stated earlier, XQuery expressions can get far more complicated than what I’ve demonstrated so far or will be demonstrating, so I recommend you review the XQuery Language Reference if you plan to write many XQuery expressions. Also note that we’ll be using the same test environment in the next Level, so you might want to keep that around, if you’re so inclined.

This article is part of the parent stairway Stairway to XML

Rate

5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (2)

You rated this post out of 5. Change rating