Efficient use of dedicated columns vs. XML

Question

Efficient use of dedicated columns vs. XML

tonyp-1059590

Grasshopper

Points: 15
More actions
August 20, 2009 at 6:50 pm

#214022

I'd love to get some feedback from anyone else who's dealt with this issue and/or can give me an idea on how to quantify it...
We store a large bit of data as XML - it comes in from external sources, and winds up in our database. This XML data includes at least two points of latitude and longitude. They can have values, or be null.
Now, this data is retrieved using a view, and this view is joined with another table. An example of one of the fields in the view is:
CASE WHEN SOD.OtherData.value('(/otherData/gps/lat)[1]', 'decimal(18,10)') = 0.0
THEN NULL ELSE SOD.OtherData.value('(/otherData/gps/lat)[1]', 'decimal(18,10)')
END AS StartLatitude
The other columns are extracted similarly from the XML, along with one additional column that calls a udf to calculate the distance between the two points.
My contention is this isn't a highly efficient way to access this data. There must be some parsing overhead for this information (which, by the way, can be returned in result sets of hundreds, or even thousands of rows).
My thinking is that this information should be extracted once, stored in a table, and then the table queried. My developers have said "it's not that much overhead to query this XML data."
So the question is - is XML data storage (btw, this is SQL 2005) efficient enough that querying and joining to data within an XML field does not provide significant overhead? Or should data that's repeatedly queried against be extracted with a one-time process, and stored statically in traditional columns.
Any thoughts on what's better and/or best practices in general for using this XML data? (I should mention that once written, the XML is never itself changed - so once I have two points, the distance is not going to change).
Thank you,
tony

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

As usual, the answer is "it depends".

If the xml file is not deeply nested and has just a few elements, performance difference over one or the other shouldn't be a big issue.

The more complex the xml structure gets and/or the more elements you have in your xml the more decreases performance (even with an xml index on the column).

I'm not sure if you've talked to an experienced SQL 2K5 programmer or if the programmer is more familiar with .NET, JAVA, C# or the like. I'd expect the latter...

Please take the time to read the following article[/url], where pretty much the same issue you're talking about is discussed. But before making any conclusion just by the content of the article make sure you follow the related discussion. Especially Paul's post .

My personal thought: If I use xml data in joins or any SQL code, I'd extract the data into relational tables. If it's just to hold data for the application, I'd leave it as xml.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 2

As a general rule of thumb , to quote Michael Rys (MS XML SqlServer

program manager)

http://www.itwriting.com/sqlxml.php

"If your data fits the relational model, you’re probably better off probably still taking the XML and shredding into relational form, because all the data really is relational data and you want to process it that way.
"On the other hand, if you actually have a collection of values, if you have object structures of things that you want to treat almost as an atomic item, but which you might want to query to find individual values, then having the ability to manage that in an accessible way inside the database is good. Look at arrays. If you have data that fits into an array paradigm, like a series of measurements for example, you might want to manage that series of measurements as an unique whole. Trying to do that design on the relational level is more problematic than if you can store it within one of these non first normal datatypes."

Clear Sky SQL
My Blog[/url]