Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


if you're using typed xml in sql server 2008, does the xml you store need to have every node that's...


if you're using typed xml in sql server 2008, does the xml you store need to have every node that's in the XSD?

Author
Message
Christian Bahnsen
Christian Bahnsen
SSC-Enthusiastic
SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)

Group: General Forum Members
Points: 121 Visits: 161
I'm experimenting with the xml datatype in sql server 2008. I just want to confirm that, when using typed xml, the document you store in an xml datatype doesn't need to have every node described in the XSD. In other words, as long as the xml document conforms to the XSD, you can have as many or as few nodes as necessary.

A somewhat related question: In typed xml, does sql server store all the tags or does it have a more efficient internal mechanism for storing the document?

Thanks in advance for any help.

Chris
GSquared
GSquared
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14363 Visits: 9729
You don't need all the nodes. XML, by default, doesn't contain nodes with NULL values, so the engine just assumes those are NULL if they're missing.

Untyped XML can actually take more storage than the exact same data as varchar(max).

I recently imported a fairly large XML file into a table, so I tested storage size with it by keeping it raw text (varchar(max)) in one table and XML in another table, then checking the Disk Usage By Table report for the database. The raw text took 680k for the table (allocated), 664 (data), while the XML took 1,128 allocated and 1,088 data.

I don't have time right now to set up an XSD for that table, so can't test typed XML right now. Sounds like you have what you need to test that. Try text, untyped XML (no XSD, but stored in an XML datatype column), and typed XML (XML datatype with a declared XSD). See what sizes you get with each of those.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Christian Bahnsen
Christian Bahnsen
SSC-Enthusiastic
SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)

Group: General Forum Members
Points: 121 Visits: 161
Thanks for confirming what I already suspected re the nodes question.

Regarding the storage question, a client has an old Access app I've been asked to refactor. The app has a table with almost two hundred columns, most of them sparsely populated if populated at all. I've been weighing whether to keep the table, as ugly as it is, or switch over to storing the data using xml. I know in sql you can flag columns as sparse, but I still object on principle to having such a wide table.

Xml would let me store just the nodes/fields that are actually used, but it seems like there's a lot more overhead to storing and extracting the data. Storing all the tags would take up more room than the data they surround. I like the flexibility xml offers but it's so verbose.

I'll try your suggestion to see if typed xml is stored more efficiently.

Thanks again.
Arthur Olcot
Arthur Olcot
Hall of Fame
Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)

Group: General Forum Members
Points: 3042 Visits: 1777
In other words, as long as the xml document conforms to the XSD, you can have as many or as few nodes as necessary.


That is the crucial thing, as long as the xml document complies with the schema. You need to bear in mind that if the schema has nodes that are defined as mandatory, via the attribute minOccurs (or omitting it entirely which defaults it to 1) then you will need to ensure that those nodes exist in the XML (and for the number times as the schema defines if minOccurs > 1). Additionally through the attribute maxOccurs you may not exceed the number of repetitions of that node defined which again defaults to 1.

So yes you can have as many or as few nodes but the number of those nodes is controlled by the schema itself. you cannot omit nodes unless the minOccurs attribute is set accordingly.

XML is quite expensive to work with, especially if you need to query the contents of the XML on a regular basis. XML indexes in my opinion are very expensive with regard to storage due to the way they are constructed internally so they need to be carefully considered. have you considered a hybrid approach? If you are going to need to query any of the columns in that wide table on a regular basis I would recommend persisting them as columns still, whilst any other columns that are not going to be queried very often can be put into an XML instance.

I help look after TB's of XML where I work and I do like working with it, but it isn't cheap in SQL server. The XML type is great at storing structured data quite simply for middle-tier apps to consume where the SQL server does little more than a simple select statement on the whole XML blob. If you intend the SQL server to query the contents of the XML or shred it entirely on a regular basis then you will incur high performance costs compared to having a nice set of tables.
Christian Bahnsen
Christian Bahnsen
SSC-Enthusiastic
SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)SSC-Enthusiastic (121 reputation)

Group: General Forum Members
Points: 121 Visits: 161
Thanks for your reply.

You're absolutely correct about required columns (minOccurs).

I've been leaning toward the hybrid approach, viz., storing frequently used/searched fields as columns then using an xml column to persist the sparsely used fields.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search