I can't imagine ever wanting to use the XML data type. The only time I'd ever store XML of any sort in the database is if there is some kind of real requirement (regulation) to store the exact messages used in an exchange. In that case it gets stored as a varchar(max). Data stored in the XML data type is not guaranteed to be the SAME as the original XML document/message, so for most regulatory/audit requirements it is not sufficient.
If there is any need to use the data contained within the XML documents, beyond simple storage and retrieval, it gets "shredded" and stored in a properly normalized structure.
Hi all, thanks for the feedback!
To DC - One of the advantages of the XML data type is that you can easily shred your XML documents into relational data using the XML data type's built-in methods like .query() and .nodes(), for relational storage. Even if you don't want to store data using the XML data type itself, it's still useful for manipulating XML data being passed in and out.
And of course XML is an excellent choice for semi-structured data which might not fit well into the relational model. Even our good friends over at ANSI have seen the need to store, retrieve, and manipulate XML in SQL databases (the ANSI SQL/XML standard was approved and is currently under revision AFAIK).
It all depends on what you want to do with it. For instance, when combined with some of SQL Server's other new features like HTTP Endpoints, etc., you can potentially "eliminate the middle man" and send XML documents directly to SQL Server for shredding via Web Methods. Done right, this could potentially eliminate network traffic, cut down on development time, and eliminate potential points of failure in the system.
Consider sending a 100 MB XML document to a .NET app, an app who's sole purpose is to shred the XML and send it to SQL Server. You have a potential point of failure between the front end and the .NET app in the middle, and another potential point of failure between the .NET app and the SQL Server. You also have the additional overhead of opening and maintaining a SQL Server connection while transferring the data from the .NET app to SQL Server, and the additional bandwidth involved in effectively sending the data over your network twice (once from the front end to the .NET app and once from the .NET app to SQL Server).
By setting up an HTTP Endpoint and sending the XML data directly to SQL Server for shredding you eliminate one potential point of failure, remove the additional overhead of opening and maintaining a connection, and only have to transfer your XML data once. Of course this is just one example, which might or might not be applicable to your particular situation.
Also keep in mind that not all data is best represented relationally. SQL is great for highly structured, highly "regular" data. XML is often better at representing semi-structured and hierarchical data (granted CTE's in SQL do help a bit with the hierarchical data).
Now there you go...
Alarms should go off when you say something like "not all data is best represented relationally." Semi structured data is an oxymoron. Granted, it is an oft-repeated oxymoron, but an oxymoron none the less.
And "hierarchical" data is something that is very rare. Sure, most data CAN be viewed hierarchically, and we, as humans, often find it very useful to categorize things. The problem is that once you start to manage your data in hierarchical structures, you have limited the ability of other users to view the same data in different hierarchies. The process of categorizing data into hierarchies (taxonomy) necessarily implies assigning more "value" to some attributes and less to others; this is one of the main problems that lead to the demise (almost) of hierarchical DBMS's. And if more practitioners weren't totally ignorant of history, XML would have been relegated to forever be a bit player, instead of being a "major feature" in SQL Server or any other DBMS.
To qualify as "hierarchical" the data must be able to be accurately represented in one and only one hierarchy. Otherwise, the hierarchy is merely a mechanism imposed upon the data for a specific purpose, and thus should NOT be managed in hierarchial structures (XML etc...)
To point to XML as a means of eliminating network traffic or storage space is the very definition of irony, although I'm sure that in this case it was purely unintentional.
DC - Why should alarms go off? There is a lot of data in the world that is better represented in a hierarchical structure than in having a relational structure imposed upon it. Consider the simple example of HTML. Obviously an HTML page could have a relational structure imposed upon it, but that would be awfully painful (look at the source code of this page and imagine it in relational format, for example).
For a more business-oriented example, consider ERP with its Bills of Material and Bills of Labor. I've seen detailed BOMs that can go very, very deep indeed. And I've seen the pain caused by trying to force a relational format on those BOMs (particularly in older versions of SQL Server where CTEs were not available).
As far as BOMs are concerned, and most hierarchical structures, there is more "value" (in one form or another) assigned as you move up the tree. For instance, the $ value of a bicycle in a BOM increases as you add both tires to the frame. And as you go up another level in the BOM, the $ value increases again when you add pedals. In a standard business Org. Chart, the "value" you speak of might be measured in units of "authority", "accountability", or some other metric. This is not a function of XML or even hierarchical databases themselves, it's standard practice when representing data hierarchically using any method, so I'm not really sure what you're driving at. And yes, one of the features of XML is that position is important, as it can provide context to the data, as opposed to the SQL model which is order-independent.
Do you have a reference for all hierarchical data being able to be represented in "one and only one hierarchy"? I would need a reference for that, as I can easily show a hierarchy like the following:
Bishop |----Priest |-----Nun
And I could represent the same data in a second hierarchy:
Pope |----Bishop |-----Priest |-----Nun
So I'm not seeing what you're getting at there. One subset of data represented in two different hierarchies.
And of course, using the SQL model I either have to assign them all the same attributes (i.e., Pointy Hat Color, Habit Size, etc.) even though for many of them it will be NULL (Nuns don't have Pointy Hats and the Pope doesn't wear a Habit), or create a separate table for each of Pope, Bishop, Priest, Nun representing their attributes. And of course another table for Altar Boy, one for Church Janitor, etc.
Finally, in the example I gave, I'm talking about transferring a 100 MB XML file from the front end to the .NET app in the middle. Further, after it's shredded we'll be transmitting maybe another 100 MB of network traffic from the .NET app to the SQL Server (depending on how much of the XML data we actually decide to keep). If you eliminate the .NET app in the middle, you've just cut your network traffic in half. I suppose I'm missing the irony of essentially cutting your network traffic in half...
"And I could represent the same data in a second hierarchy"
Thank you for making my point. In order to represent data stored in a hierarchical structure, in a different hierarchy, redundancy is required.
Data stored in relational structures can be represented in many different hierarchies. It is much more difficult, if not impossible, to do with data stored in hierarchical structures. Granted, representing complex hierarchies is...well...complex, unless you store it that way. But in so doing you limit the data's usability for other purposes, and you loose the ability to declare any but the most rudimentary constraints on it.
So, now you've flushed data independence and data integrity down the drain... Let me think... What was the purpose of a DBMS in the first place?...
If you need any further proof that hierarchical structures are very poor for general data management, just look at all the problems with file systems. The organization of file systems is notoriously difficult because what makes sense for one person or group, makes no sense for others. So what happens? Multiple file structures are created to support the different useful views of the data. Then files are duplicated all over the place, many are fogotten and very quickly, no one has any idea which files are current etc...
Hierarchical data management is a mess. Those who are smart would avoid it.