In a previous post I talked about performing "inner joins" in XQuery. The basic idea is that the inner join is simply a special case of a Cartesian product, or "cross join". In this post I'm going to briefly look at another common SQL-style join condition that can be duplicated to some degree in XQuery. Specifically I'm going to give an example of a retrieving nodes from a tuple stream that don't match the nodes in a second tuple stream.
In this post we'll duplicate about 50% of the SQL "left outer join" functionality. For those who aren't familiar with SQL left outer joins, they can be thought of as an inner join between two tables unioned together with the rows of the left-hand table that have no corresponding rows in the right-hand table.
For this example I'm going to borrow some slightly modified XML data from the previous post:
DECLARE @xml xml;-- Create sample XML documentSET @xml = N'<authors> <author id = "1">Fabio Claudio Ferracchiati</author> <author id = "2">Hugo Kornelis</author> <author id = "3">Rob Walters</author> <author id = "4">Lara Rubbelke</author> <author id = "5">Adam Machanic</author> <author id = "6">Michael Coles</author> <author id = "7">Robin Dewson</author> <author id = "8">Jan D. Narkiewicz</author> <author id = "9">Robert Rae</author></authors><books> <book title = "Pro T-SQL 2005 Programmer's Guide"> <isbn>159059794X</isbn> <author>6</author> </book> <book title = "Accelerated SQL Server 2008"> <isbn>1590599691</isbn> <author>3</author> <author>6</author> <author>7</author> <author>1</author> <author>8</author> <author>9</author> </book> <book title = "Pro SQL Server 2008 XML"> <isbn>1590599837</isbn> <author>6</author> </book></books>';-- Perform outer joinSELECT @xml.query('for $author in /authors/author where fn:empty($author[@id = /books/book/author]) return <author> { $author } </author>');
This sample is available as a download file here.
The XML data consists of a list of authors and a list of books. The query uses a FLWOR expression to retrieve all authors who don't have a corresponding book node. Here's how it works:
The results are shown below:
<author> <author id="2">Hugo Kornelis</author></author><author> <author id="4">Lara Rubbelke</author></author><author> <author id="5">Adam Machanic</author></author>
As you can see, the three authors with id's of (2, 4, 5) were returned because they have no corresponding book nodes in the XML data. This technique, combined with the previously discussed inner join technique, can be used to simulate SQL style outer joins in XQuery.
Part of the promise of XML is that it makes modeling hierarchical data relatively painless. The hierarchical structure of XML automatically provides context to your data. Consider the following simple example:
<General name="Grant"> <Colonel name="Sanders"> <Major name="Payne"> <Captain name="Calamity"> <Sergeant name="Bilko"> <Private name="Benjamin" /> </Sergeant> </Captain> </Major> </Colonel></General>
As you can see, the hierarchical structure is implicit in the structure of the XML. The General is above the Colonel is above the Major, and so on. Organizational charts, military chains of command, and other similar structures are standard fare for hierarchical modeling. Needless to say, the AdventureWorks database has a standard organizational hierarchy modeled as relational data which you'll probably see used in a whole lot of samples all over the place.
But AdventureWorks also includes a more detailed, but often-overlooked, real-world example of hierarchy. The AdventureWorks Bill of Materials (BOM) hierarchy is modeled as an adjacency list hierarchy; one in which the child nodes maintain links to their parent nodes in the same table. You can query the Bill of Materials using the dbo.uspGetBillOfMaterials procedure like this:
EXEC dbo.uspGetBillOfMaterials 749, '2001-01-01'
The results are a "flat" relational listing of the raw materials used to build the "Road-150 Red, 62" bike.
The results do provide the parent and child component ID numbers, as well as the level of the component in the BOM. But they do not convey the hierarchical information without some sort of reconstruction on your part - probably performed in the UI for display purposes. All of the information required to reconstruct the hierarchical structure of the BOM is available in this result set, but it's not readily apparent. You can use FOR XML PATH and the xml data type to reconstruct the hierarchical structure of the BOM with a simple recursive function like this:
CREATE FUNCTION dbo.GetRecursiveBom(@ProductID int, @level int = 0)RETURNS xmlASBEGIN DECLARE @x xml; SET @x = ( SELECT bom.ComponentID AS "@id", p.ProductNumber AS "@number", @level AS "@level", p.Name AS "name", p.Color AS "color", p.ListPrice AS "list-price", bom.PerAssemblyQty AS "quantity", p.Size AS "size", p.SizeUnitMeasureCode AS "unit-of-measure", ( SELECT CASE WHEN @level < 5 THEN dbo.GetRecursiveBom(bom.ComponentID, @level + 1) ELSE '' END ) FROM Production.BillOfMaterials bom INNER JOIN Production.Product p ON bom.ComponentID = p.ProductID WHERE @ProductID = CASE WHEN @level = 1 THEN bom.ComponentId ELSE bom.ProductAssemblyID END FOR XML PATH (N'item'), TYPE ); RETURN @x;ENDGO
This function calls itself recursively to retrieve a BOM (up to 5 levels deep) in XML format. You can call the function like this:
SELECT dbo.GetRecursiveBom (749, 0);
And the resulting XML looks like the image below:
As you can see the item nodes are nested within each other, with each component acting as a container for its constitutent components. The code for this function is available as a download file here: GetRecursiveBom.sql.txt
In a future post we'll look at how to create a "recursive" XML schema to validate the XML produced by this function.
SQL Server has a handy option for loading XML data directly from the file system, the OPENROWSET function. To load XML files into SQL Server, simply declare an xml data type variable and assign it the value of the OPENROWSET function with the BULK option. Here's a quick example that uses the state-list.xml file (attached to this post, or available here):
DECLARE @xml XML;-- Use OPENROWSET to read an XML file from the file systemSELECT @xml = BulkColumnFROM OPENROWSET(BULK 'c:\state-list.xml', SINGLE_BLOB) TempXML-- View the resultSELECT @xml;
The first OPENROWSET parameter is the filename to load, preceded by the BULK keyword. The second parameter defines the type of the file. Valid values are SINGLE_CLOB for character large object, SINGLE_NCLOB for Unicode character large objects, and SINGLE_BLOB for binary large object. Microsoft recommends always setting this parameter to SINGLE_BLOB. Also make sure that you alias the OPENROWSET (in this example I used "TempXML" as an alias). The single column returned is named BulkColumn.
The only real caveats here are:
Another "little thing": the path to the filename is from the perspective of the server. So if you're logged into a remote server from a client computer, "C:\" is the root directory on the server not the client's local C: drive.
Also note that you cannot use a variable in place of the filename. You have to give OPENROWSET a string literal. To get around this, you can use dynamic SQL to construct your OPENROWSET function call if necessary. **Always be cautious when using dynamic SQL!***
In the next post we'll discuss some xml data type-related functionality.
I've decided to kick this blog off with a double-feature. In the last post I gave an example of shredding XML query plans with the nodes() method. In this one we'll discuss the nodes() method in a little more detail.
In a previous post on the Pseudo-Random blog, I recommended that everyone who's still using OPENXML should switch over to the xml data type's nodes() method. The problems I mentioned included:
I don't want to use up this entire blog entry complaining about OPENXML, so I'll cut it short here. From the above you get the idea that OPENXML is an inferior method of shredding XML data. So what's the alternative? Glad you asked. The xml data type includes several built-in methods, including one method designed just for shredding XML data: the nodes() method. Basically this method works like this:
The sample query below demonstrates the nodes() method. In the sample we shred the XML document into relational rows, and query the context node ('.') to return each relational row.
DECLARE @xml XML;-- Character representation of the XMLSET @xml = '<capitals> <state name="Alabama" abbreviation="AL" capital="Montgomery" /> <state name="Alaska" abbreviation="AK" capital="Juneau" /> <state name="Arizona" abbreviation="AZ" capital="Phoenix" /></capitals>';-- Retrieve each individual node from the relational result setSELECT Node.query('.')FROM @xml.nodes('/capitals/state') TempXML (Node);
Once the xml data type instance is shredded into relational rows of the xml data type, you can use the other xml data type methods like query() and value() to extract nodes and scalar values from the rows. As I previously said, the xml rows returned by the nodes() method are restricted in their functionality. You can't cast them to other data types; you can only use the xml data type methods on them to extract their contents. We can use the value() method in our example to extract each state's name, abbreviation, and capital from the nodes() result set, as shown below.
DECLARE @xml XML;-- Character representation of the XMLSET @xml = '<capitals> <state name="Alabama" abbreviation="AL" capital="Montgomery" /> <state name="Alaska" abbreviation="AK" capital="Juneau" /> <state name="Arizona" abbreviation="AZ" capital="Phoenix" /></capitals>';-- Retrieve scalar values from the relational result setSELECT Node.value('@name', 'varchar(100)') AS Name, Node.value('@abbreviation', 'varchar(2)') AS Abbreviation, Node.value('@capital', 'varchar(100)') AS CapitalFROM @xml.nodes('/capitals/state') TempXML (Node);
As you can see, this syntax is fairly intuitive, especially when compared to the equivalent OPENXML syntax. The results are shown below.
In the next post we'll talk about loading XML data from files from the file system directly into SQL Server.
Hi everyone, in this blog I plan to discuss the new SQL Server 2005 and SQL Server 2008 XML features. While I have several topics I plan to cover already, if you have specific SQL Server + XML related questions, feel free to contact me via this blog and I'll try to answer, or at least help you get a little closer to the answer.
I decided to kick this off with a simple example to demonstrate the utility of XML in SQL Server. SQL Server 2008 and 2005 store cached XML query plans that are accessible via the sys.dm_exec_query_plan dynamic management function. This function takes a query plan handle as an argument and returns the XML query plan, along with some other metadata.
While I was at the PASS Conference in Denver, I threw together a quick example to demonstrate shredding the cached XML query plans on SQL 2005. I expanded it a little bit to include some other information from the sys.dm_exec_sql_text dynamic management function and the sys.dm_exec_cached_plans dynamic management view.
This sample query uses a few of the new features available starting with SQL 2005: the CROSS APPLY operator, Common Table Expressions (CTEs), dynamic management views/functions, and the ROW_NUMBER() windowing function.
The most important feature we're using here is the xml data type and its nodes() and value() methods. Since the XML query plans are stored as xml data type data, we are using the nodes() method to shred the XML data nodes into rows. Then we apply the value() method to each of these new rows to extract individual query plan operators and operator information from the plan. In this case we pull the physical operator name ("Nested Loops", "Filter", etc.) and the estimated subtree cost for each operator. As a bonus we are grabbing the initial SQL statement that is the basis for the plan from the sys.dm_exec_sql_text dynamic management view. That way we can easily relate the query plan, and its individual operators, back to the source SQL statement.
This is just one example of the capabilities of the xml data type. In upcoming posts I'll describe more precisely how the xml data type methods and XQuery work.