RE: For SQL Server, XML Is One Answer

Mr or Mrs. 500

Points: 526

October 23, 2003 at 12:36 am

quote:
In response to your article which argues that XML is a good way to manage data,

My article is primarily concerned with illustrating how XML can be used to COMMUNICATE, not manage.

quote:
The relational model of data is based on sound scientific theory. What is the XML "model" based on?

XML is not a model. It is a method of communicating data that is scalable, self consistent, highly flexible, and highly readable.

If you disagree that XML is for communication, say so. However, to attack XML on the basis that it is not good for data management is to attack the grace and beauty of the Atlantic Bottlenose Dolphin for not being able to ride a bicycle.

quote:
For data transmission, ANY agreed upon format will work

There are simple, common sense rules built in to XML that do not come with CSV.

--The ability to nest records in a readable manner.

--The ability to specify and validate data format and types outside of the application code. (XML Schemas)

--The ability to specify that additional data or data fields can be transmitted, at any level, without affecting the receiving application. (Open specifications)

--The ability to specify character encoding (useful for languages that are not based on the Latin-1 character set).

Even X12 EDI doesn't do all of these things.

CSV is only a text format. Parse rules are not standardized. Every application is different. Therefore, each parser must be written again, with different rules in mind. This in unproductive at best.

If you have a parser already, or the rules are CERTAIN never to change... write your own parser. (You live in a more certain world than I do).

quote:
(who says a CSV can't contain compete transactions?)

CSV is flat. Like the results of a single SELECT statement, any joins result in data being repeated from the "master" record for each instance of the "detail". The data that is part of the master record is not distinct from the data from the detail record. Without additional documentation, it is impossible to tell what fields belong to what relations. What if you have one master table and two detail tables? What if you have a chain, with a relation that goes from table A to table B to table C (customer -> order header -> order item). CSV shows none of this. XML shows all of it.

Additionally, there is no STANDARD method for dealing with master records that have no detail relationships. Will your CSV record have fewer fields than it's neighbors, or will there be blank fields (e.g. 1,2,,,,,,). Are these fields blank or null? Should the detail record be created with null values? blank values? not created at all?

These questions still come up with XML, but FAR less often. The standard rules of XML are what make it stronger than CSV.

quote:
Why choose one that is as bloated as XML? Just because it is readable? Seems like pretty flimsy reasoning to me...

As I've pointed out, CSV records that represent master-detail relationships have data repeated from one row to the next. If I want to be picky, I'd say that CSV is bloated, and that we should all use X12 (EDI). Now THAT'S an efficient format!

I agree that XML is not an efficient mechanism for storing data for long periods of time. Why is it bloated? I can only conjecture. Let's just agree that it's value does not come from it's efficient use of bandwidth or hard drive space.

It's value comes from simple rules that are easy to enforce, which makes it the FIRST data communication format that offers a readable, scalable, flexible mechanism for cross platform communication.

This is why it is widely supported already. This is why XML will stick around. For data communication, CSV pales by comparison.