Home Forums Programming XML if you're using typed xml in sql server 2008, does the xml you store need to have every node that's in the XSD? RE: if you're using typed xml in sql server 2008, does the xml you store need to have every node that's in the XSD?

  • In other words, as long as the xml document conforms to the XSD, you can have as many or as few nodes as necessary.

    That is the crucial thing, as long as the xml document complies with the schema. You need to bear in mind that if the schema has nodes that are defined as mandatory, via the attribute minOccurs (or omitting it entirely which defaults it to 1) then you will need to ensure that those nodes exist in the XML (and for the number times as the schema defines if minOccurs > 1). Additionally through the attribute maxOccurs you may not exceed the number of repetitions of that node defined which again defaults to 1.

    So yes you can have as many or as few nodes but the number of those nodes is controlled by the schema itself. you cannot omit nodes unless the minOccurs attribute is set accordingly.

    XML is quite expensive to work with, especially if you need to query the contents of the XML on a regular basis. XML indexes in my opinion are very expensive with regard to storage due to the way they are constructed internally so they need to be carefully considered. have you considered a hybrid approach? If you are going to need to query any of the columns in that wide table on a regular basis I would recommend persisting them as columns still, whilst any other columns that are not going to be queried very often can be put into an XML instance.

    I help look after TB's of XML where I work and I do like working with it, but it isn't cheap in SQL server. The XML type is great at storing structured data quite simply for middle-tier apps to consume where the SQL server does little more than a simple select statement on the whole XML blob. If you intend the SQL server to query the contents of the XML or shred it entirely on a regular basis then you will incur high performance costs compared to having a nice set of tables.