RE: Is XML the Answer? – SQLServerCentral

Valued Member

Points: 65

October 7, 2003 at 9:28 am

XML addresses two major needs: transferring data from one place to another and storing "sparse" data. The usefulness of XML in transferring data has been addressed well in the previous comments so I won't go into it.

But let's talk about storing "sparse" data. You have a sparse data situation when most of your records don't have actual data in them, i.e., they have a lot of missing fields. As an example, think about people's addresses. It used to be that a person's address referred to a street address, but now it could be an email address, or a web site address, or any number of different phone numbers. In my contacts list I have a whole bunch of people with just their street addresses and a whole bunch with just their email address. So the questiuon is how do I store this information economically. If I use an RDB one choice is to have a StreetAddress table and an EmailAddress table both related to NameTable. Some of the people won't have an EmailAddress record and some won't have a StreetAddress record. It's trivial to get a listing of people with street addresses and it's trivial to get a listing of people with email addresses. But what if I want a listing of everyone with the appropriate address shown? Then I have to start doing outer join queries which are always a mess to get right. If I throw in telephone numbers I just add to the mess. XML can solve the problem by providing a single Address document that is subdivided into StreetAddress elements (which in turn can be subdivided into street, city, state, postal code elements), EMailAddress elements (which using attributes might distinguish between business email and personal email) and Telephone elements (also divided possibly into home phone, work phone, cell phone, pager, etc.). If, for a particular person, some (or most) of the data is missing, then those elements are left out of the Address document. Nothing extraordinary has to happen to create my contact list. XSL and XQuery easily deal with the lack of data.

Addresses are just a simple example of this kind of situation. Think about what all the possibilities are to catalog a book (author names, available languages, editions, reprints, etc.) and imagine having to create an RDB that handles all the situations. It's possible but the schema will be incredibly complex. (Go look at the MODS XML schema being defined by the Library of Congress to see the gory details). It's these situations that XML does better. No one is ever going to produce an efficient XML implementation of an accounting system, but on the other hand there are clearly many situations where RDBs are too structured to provide an efficient repository mechanism.