Basics of XML and SQL Server, Part 2: Shredding XML

  • This is a silly question but curiousity is getting the best of me. Am I the only one that got 499 results? Since we're talking about the S&P 500, I'm assuming that 500 records would be present but my result set was 499.

  • XQuery is usually faster than OpenXML when shredding small XML documents, but slower when shredding large documents.

    In some recent testing I found that with a document only about 30 columns wide and 150 rows long, that if I ran the same shred in both XQuery and OpenXml in one query, the XQuery took 100% of the query cost. I was blown away by the performance hit. The query plan clearly shows that sql server creates a table per XQuery operation, and then has to inner join all 30 tables to get you your result table. After reading up on it I was able to spend hours adding some complicated middle steps to the XQuery version that dramatically sped it up. Now it only took 96% of the query cost! Needless to say, I will just use openXml in almost all situations now. The only time XQuery is worth it is when you're dealing with messy user data where you might have ampersands and other non-allowable characters in the text, the xml datatype happily swallows a whole slew of errors that make openXml blow up.

  • I got the data file from EODDATA.

    http://www.eoddata.com/

    You will have to ask them.

  • Stan Kulp-439977 (3/15/2012)


    I got the data file from EODDATA.

    http://www.eoddata.com/

    You will have to ask them.

    As far as I know the S&P may only track 499 and use "500" because it just sounds better. I wanted to make sure your outlined process wasn't truncating the import for some strange reason. 🙂

  • My bet is that during the course of a year they lose some through mergers, bankruptcies, etc. and have to replace them. It may take a while to find suitable replacements.

    I played with that data quite a bit, and I have no reason to believe that the 499 count is wrong.

  • Stan Kulp-439977 (3/15/2012)


    My bet is that during the course of a year they lose some through mergers, bankruptcies, etc. and have to replace them. It may take a while to find suitable replacements.

    I played with that data quite a bit, and I have no reason to believe that the 499 count is wrong.

    I agree with that. I just wanted to confirm that someone else was also getting the 499 result set. Thanks for the follow-up and very nice article by the way. I don't usually walk through a lot of the steps in these articles but I followed this one all the way through (which is why I was concerned about the 499 results).

    Again, thanks for the follow-up.

  • I'll look for the article..

  • I think this is one of the best XML primers I've seen. Thank you for such a great article!

  • It took me a while before I could read the article past where we generate XML with attributes and then transform it using a style sheet.

    Maybe I'm missing something, but wouldn't it have been easier to use

    SELECT DISTINCT * FROM [dbo].[sp500hst] AS StockData WHERE [Date] = '20090821' ORDER BY [Ticker]

    FOR XML PATH('StockData'),ROOT('StockMarketData')

    instead of

    SELECT DISTINCT * FROM [dbo].[sp500hst] AS StockData WHERE [Date] = '20090821' ORDER BY [Ticker]

    FOR XML AUTO,ROOT('StockMarketData')

    followed by a transform using external processes, which, while interesting, doesn't really add much to the discussion about XML in SQL?

    All my XML work uses elements not attributes (not my choice, if that matters to anyone), which may explain why this jumped out at me.

  • I'm sure your way is more efficient, but I was not really focused on the most efficient way to do things when writing this tutorial.

    I was interested in demonstrating how to do things in as illustrative and understandable a manner as possible.

    I broke it into stages just to illustrate it better, and to prepare for the next article in which I automate the transforms in SSIS.

    For me, learning how to do something new is the hard part. Once I understand it, I can figure out how to improve it on my own.

  • Stan Kulp-439977 (3/18/2012)


    ...I was interested in demonstrating how to do things in as illustrative and understandable a manner as possible.

    I broke it into stages just to illustrate it better...

    That's exactly why I found it a bit distracting to have a whole section called 'Transform the XML document's attributes into elements' in the middle of an otherwise interesting article; While the transform stuff is interesting in its own right; it's a *bit* off-topic and didn't really help explain 'XML Shredding', which was the core of your article.

    I'm looking forward to the next installment... I have some XML transforming stuff I do at a particular customer using Task Manager, I'm interested to see if there'll be any merit in using SSIS instead 🙂

    /Ryan

  • This article was a great help.

    My relational mindset is stubbornly resisting the hierarchical nature of xml and I was having a rough time with even the simplest operations. This article helped me shred a fairly complex xml document that I've been having issues with. My only problem now is that it only seems to be bring in the first instance of a node sequence (ie - if a productID can have multiple itemIDs, my shred only brings in the first ItemID). I suspect this is where the "cross apply" operator comes into play and I'm sure I'll figure it out with some practice.

    I'll never be an xml guru, but with the information in this article I no longer consider it an achilles heel. Thanks!

  • ** Update**

    I went through the example as presented and even though it worked it generated error messages (SQL 2012).

    I changed the script to what I thought it should be and the errors went away.

    Thanks for taking the time to write the article. I understand the material better than I did yesterday after reading about this topic in one of my books. 🙂

    ** End Update **

    Did you make a mistake in the openxml example? :w00t:

    I ask this because you have the following:

    EXEC sp_xml_prepareDocument @hdoc OUTPUT, @Doc

    So now I have to access the xml data using @hdoc right?

    But you do it through @idoc in the following statement.

    FROM Openxml (@iDoc,'//StockMarketData/StockData ')

    You refer the reader to a Microsoft example which is:

    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

    OPENXML (@idoc, '/ROOT/Customer',1)

    So it would seem that your openxml statement should be:

    FROM Openxml (@hdoc, '//StockMarketData/StockData' @iDoc)

    Am I correct or am I missing something because I do not understand how you can access the xml data using

    @idoc which you initialize to a value of 1? Struggling a little with this.

Viewing 13 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply