For SQL Server, XML Is One Answer

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/nmalik/forsqlserverxmlisoneanswer.asp

  • In response to your article which argues that XML is a good way to manage data, I would simply make one statement and ask one simple question.

    The relational model of data is based on sound scientific theory. What is the XML "model" based on?

    Do a bit of research and you will find that XML is trying to build "scyscrapers on shack foundations."

    For data transmission, ANY agreed upon format will work (who says a CSV can't contain compete transactions?) Why choose one that is as bloated as XML? Just because it is readable? Seems like pretty flimsy reasoning to me...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I don't see this article as arguing for XML as a data management tool, in fact I see it as agreeing that XML is not a management tool, but a communication tool.

    I do agree that any agreed upon format will work and that XML is bloated, but people everywhere have jumped on XML as the agreed upon format of choice. Why is that?

    I am sure that there are many reasons, but I think that flexibility and readability are at the top of the list. Sure CSV could probably do everything that XML can do, but at what cost? I know after working with XML, CSV and other flat file input types, I'll choose XML everytime for the ease at which I can parse out complex relationships. Readability, while I agree not important for the machine, does save developer hours trying to debug and develop. I would say that XML has been the communication tool of choice because it is simple and it works and it is the easiest most powerful way at the moment to communicate data. Does it have it's shortfalls, yes but as the author pointed out, it is useful for certain purposes.

    I think that this discussion is great, and that all developers need to think about the consequences of the tools and methods that they choose, and understand the shortfalls of the latest buzzword technology.

  • quote:


    I don't see this article as arguing for XML as a data management tool...


    That is exactly what he is arguing when he says that allowing the XML data type in SQL Server is "the right thing to do."

    By allowing XML into your SQL Server databases, you are effectively relegating SQL Server to the role of XML staging area. A database is only as "relational" as its logical design. You will be throwing out the relational model's benefits and trusting in the "XML model" whatever that is... Which is the reason for my question earlier. I challenge anyone to give a satisfactory answer that doesn't rely on generalizations, platitudes and marketing-speak!

    Sure XML can be used as a rather cheap (not inexpensive) shortcut, but eventually you, or rather your company will pay for the shortcuts you take in the form of system overhead, maintainability, and worst of all, data chaos.

    It is the height of irony that XML proponents insist that XML is "more flexible" than a properly designed relational database. Inflexibility was one of the major reasons that the hierarchical database has gone the way of the Dodo...only to be reborn in the form of XML. "Those who will not learn from the past..."

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Let me start by saying that XML has been overhyped, and MS is guilty of much of that. Since the big XML push of several years back, the first time I found a practical use for it was this year as way of writing out some meta-data that needed to be re-parsed. For this purpose, it saved me countless hours of writing a custom parser. (The application was to write out the definition and data from Access tables to XML so that they could be stored in a Version Control system - XML provided a great way to write these files and also recreate the tables easily from the XML.)

    That said, I do think the self-documenting nature of XML is a big plus. Having a universal format for data exchange is nice in theory but practically, I'm not sure how 'universal' it really is. As dcpeterson has pointed out, you can use any format as long as the recipient and sender know what they are doing.

    Overall, I think XML is one more tool in the toolbox, but is not the great end-all be-all of data exchange and storage.

  • quote:


    Since the big XML push of several years back, the first time I found a practical use for it was this year as way of writing out some meta-data that needed to be re-parsed. For this purpose, it saved me countless hours of writing a custom parser.


    Repeatedly, I see "parseability" as one of the stated benefits of XML; people use XML because it's easy to support, and because there is a large base of written code that does much of the "legwork" of interpreting and manipulating XML files. But these are not benefits of XML per se; these are benefits of vendors' support for XML, and had vendors invested the amount of time, money, and effort that they've spent on XML on other, more-efficient file formats, then we could derive the same benefits from those file formats. We wouldn't have to write parsers, as they'd already have been written; we would have easy ways to recreate tables from these other file formats, easy ways of translating from these formats into other formats, we would have formats that are just as readable, just as useable, and just as easy to develop with, but formats that are far more efficient as a means of data transfer and storage.

    I use XML, I know how easy it is to use, and I also know how it easy it is to understand; I enjoy the ability to interface with SQL Server via XML. But I also realize that these things have nothing specifically to do with XML. It's easy to use because vendors have made it easy to use; its readability and understandability are not unique to XML. All of these benefits, I believe, result from its base of support, and really, that is the only benefit that XML has: its broad base of support.

    The question is, "why does it have that broad base of support?" As Mr. Peterson has pointed out, it isn't founded on mathematical or logical models, and has yet to be proven to be a sound data model. As a language for data transfer, it is bloated, as even its proponents will agree. Readability is nice, but again, that's not unique to XML; in fact, one might point out that it still lacks somewhat in the readability department or there would be no reason to transform it into other formats for end users. And again, any other benefits I see result from its existing support; e.g. it's faster to develop an XML based solution, not because XML is a better language, but because parsers such as MSXML exist, web services exist that are built on XML, etc. Time and again, I fail to find any foundation for XML that is based on anything other than a sort of "bandwagon": vendor X has support to it, and that makes it easy for us, so let's use it too.

    If that is its only foundation, then we have no way of proving that it will indeed be here for a long time to come, unless we assume that the opinions, fads, and interests of the computer industry are fairly stable and seldom change; I've yet to see the computer industry prove that to be the case.

  • quote:


    quote:


    I don't see this article as arguing for XML as a data management tool...


    That is exactly what he is arguing when he says that allowing the XML data type in SQL Server is "the right thing to do."

    By allowing XML into your SQL Server databases, you are effectively relegating SQL Server to the role of XML staging area.


    That would have to be the most interesting reading of my article I've seen. If you read the article again, I did not argue for an XML data type in SQL. I offered one reason why Microsoft was right to add XML _features_ to SQL Server. In my mind, that includes the ability to get a resultset in XML and to bulk load XML into tables.

    I must admit to being confused by your anger. Your criticisms do not appear directed at the position that I took.

    To be clear:

    Relational databases are based on a well understood methodology for using of set mathematics to manage and organize data. For situations where data needs to be stored in a field-by-field manner, and retrieved using expression logic against these fields, RDBMS systems are the appropriate answer.

    Are we on the same side now?

    XML is, first and foremost, a scalable, self consistent data communication mechanism. As such, data that is being communicated usually has to be stored (for audit, history, or simply transactional purposes). The fact that XML documents spend a portion of their existence on a hard drive does not mean that many rational people are proposing the utter replacement of Relational databases with XML. That's a huge leap, and one that I am not advocating.

    I do value all opinions. However, I will be more able to address your sentiments if you respond to things that I actually said.

  • quote:


    In response to your article which argues that XML is a good way to manage data,


    My article is primarily concerned with illustrating how XML can be used to COMMUNICATE, not manage.

    quote:


    The relational model of data is based on sound scientific theory. What is the XML "model" based on?


    XML is not a model. It is a method of communicating data that is scalable, self consistent, highly flexible, and highly readable.

    If you disagree that XML is for communication, say so. However, to attack XML on the basis that it is not good for data management is to attack the grace and beauty of the Atlantic Bottlenose Dolphin for not being able to ride a bicycle.

    quote:


    For data transmission, ANY agreed upon format will work


    There are simple, common sense rules built in to XML that do not come with CSV.

    --The ability to nest records in a readable manner.

    --The ability to specify and validate data format and types outside of the application code. (XML Schemas)

    --The ability to specify that additional data or data fields can be transmitted, at any level, without affecting the receiving application. (Open specifications)

    --The ability to specify character encoding (useful for languages that are not based on the Latin-1 character set).

    Even X12 EDI doesn't do all of these things.

    CSV is only a text format. Parse rules are not standardized. Every application is different. Therefore, each parser must be written again, with different rules in mind. This in unproductive at best.

    If you have a parser already, or the rules are CERTAIN never to change... write your own parser. (You live in a more certain world than I do).

    quote:


    (who says a CSV can't contain compete transactions?)


    CSV is flat. Like the results of a single SELECT statement, any joins result in data being repeated from the "master" record for each instance of the "detail". The data that is part of the master record is not distinct from the data from the detail record. Without additional documentation, it is impossible to tell what fields belong to what relations. What if you have one master table and two detail tables? What if you have a chain, with a relation that goes from table A to table B to table C (customer -> order header -> order item). CSV shows none of this. XML shows all of it.

    Additionally, there is no STANDARD method for dealing with master records that have no detail relationships. Will your CSV record have fewer fields than it's neighbors, or will there be blank fields (e.g. 1,2,,,,,,). Are these fields blank or null? Should the detail record be created with null values? blank values? not created at all?

    These questions still come up with XML, but FAR less often. The standard rules of XML are what make it stronger than CSV.

    quote:


    Why choose one that is as bloated as XML? Just because it is readable? Seems like pretty flimsy reasoning to me...


    As I've pointed out, CSV records that represent master-detail relationships have data repeated from one row to the next. If I want to be picky, I'd say that CSV is bloated, and that we should all use X12 (EDI). Now THAT'S an efficient format!

    I agree that XML is not an efficient mechanism for storing data for long periods of time. Why is it bloated? I can only conjecture. Let's just agree that it's value does not come from it's efficient use of bandwidth or hard drive space.

    It's value comes from simple rules that are easy to enforce, which makes it the FIRST data communication format that offers a readable, scalable, flexible mechanism for cross platform communication.

    This is why it is widely supported already. This is why XML will stick around. For data communication, CSV pales by comparison.

  • hm...not that I want to say, I'm fed up with such discussions...

    But they remind me of

    - Which programming language is superior?

    - Which RDBMS is best?

    - to NULL or NOT?

    ...

    I haven't seen yet one answer I can fully identify with to the above questions. So my bottom line on this all is the best language, system, philosophy or whatsoever is the one that lets you accomplish your tasks efficiently.

    ...and that depends from individual to individual.

    Frank

    http://www.insidesql.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Just to add an additional topic to the discussion...

    The problem with XML is that it defines a notation you have to adhere to.

    So, suppose all of the applications in the world can 'talk' XML, you still need to agree upon a certain schema to make those applications talk. That is what EDI has done years and years ago.

    So the problem of interoperability still remains. If I give you an XML file, and you don't know the schema and the meaning of each node in the tree, your still as puzzled as you would be when I send you a CSV.

    But, then again, the readily available parsers do come in handy. You're pretty sure the XML doc adheres to the agreed schema when you 'open' it. You have an easy method of extracting certain values (just think XPath). You can do transformations between different formats, ... Saves your average developper / vendor a lot of time. So yes, XML has its merits.

    BTW. I do use XML and have used it quite extensively before. If I need to do any cross application communication, I will certainly stick with XML.

  • And now to the topic again...

    I do believe adding XML support to SQL Server is good. It gives you an additional interface to the system.

  • Someone asked about the theory behind XML. I'm not aware of any formal/mathematical theory, but I guess there is one related to the hierarchical, tree-structure of the way the data is stored.

    Here's a few links that may be helpful. XML is a standard developed and maintained by the w3c organization at

    http://www.w3.org/XML/

    According the orginal working document on XML,

    "The design goals for XML are:

    XML shall be straightforwardly usable over the Internet.

    XML shall support a wide variety of applications.

    XML shall be compatible with SGML.

    It shall be easy to write programs which process XML documents.

    The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

    XML documents should be human-legible and reasonably clear.

    The XML design should be prepared quickly.

    The design of XML shall be formal and concise.

    XML documents shall be easy to create.

    Terseness is of minimal importance. "

    To understand where XML has come from, you have to look at SGML. Here's a history:

    http://xml.coverpages.org/general.html#hist

    hope that helps

    dave

  • quote:


    I must admit to being confused by your anger. Your criticisms do not appear directed at the position that I took.


    There is no anger on my part; frustration at the total lack of understanding? YES, but not anger.

    quote:


    There are simple, common sense rules built in to XML that do not come with CSV.

    --The ability to nest records in a readable manner.

    --The ability to specify and validate data format and types outside of the application code. (XML Schemas)

    --The ability to specify that additional data or data fields can be transmitted, at any level, without affecting the receiving application. (Open specifications)

    --The ability to specify character encoding (useful for languages that are not based on the Latin-1 character set).

    Even X12 EDI doesn't do all of these things.


    X12 EDI does not do these things because these are NOT functions of a transport mechanism. I don't need the transport mechanism to enforce any kind of integrity because it is enforced at the sender and at the reciever and the data is not modified en route.

    You seem to not understand that as soon as you start to take on these functions you have a data model (good or otherwise) and you are performing data management. XQUERY, XML Schemas and all these other wonderous capabilities of XML that you seem so excited about ARE DATA MANAGEMENT functions!!! Now, if you are managing data with XML, once again, what is the sound logical foundation you are using?

    I'll not argue the endless virtues of CSV files. It was just the most familiar example to give for demonstration purposes. But of course, any file used for data transmission MUST be accompanied by COMPLTETE and PRECISE documentation. In my article I argue, and common sense will show, that XML tags are not sufficient documentation in an of themselves. So, if your "self describing" format can't describe itself FULLY, why bother? In this context the tags become superfluous.

    As for your example of the Dolphin. Cute, but totally inappropriate. You and others are constantly saying "XML is just for sharing data" and then turning around without even knowing touting its benefits as a data model.

    I am trying to help people see:

    1. The relational model is the only viable choice available today for data management.

    2. XML is a bad choice for data management. And that many of the "most exciting" things being done with XML ARE data management functions.

    3. XML isn't even a very good choice for data transmission. XML's self documenting features are wishful thinking at best and it's bloated.

    As mdburr stated, all the benefits of XML are not due to XML itself, they are due to the bandwagon effect. And while there is something to be said for standards, XML is a bad one all the way around, so why support it?

    Edited by - dcpeterson on 10/24/2003 10:31:51 AM

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Is there any reason to compare XML to CSV?

    Write

    <root><![CDATA[

    ..contents of your CSV file..

    ]]></root>

    and you have XML file.

    Yes, need to escape "]]>".

    Yes, need to add some funcionality to your CSV parser to skip first and last rows.

    But yes, you can include all the family of CSV files, etc. This will need a little more logic from your parser.

    But someone wrote and test this, and name it XML.

  • Bottom line...XML support provided by SQL DBMS's, I could care less. If developers want to struggle with the syntax of some FOR XML clause, who care's? If that SQL statement, like any other, consumes too many resources, it will be throttled.

    Now, one argument or beef with the vendors is, developers want to store the XML within the DBMS. Why would ANYONE want to do this? Because it is not currently provided, they store it in a VARCHAR, create an ID attribute, and call it a database. What a lousy way to manage data. And why would anyone want to go to the expense of purchasing an SQL DBMS to only have it tag and stuff a text bucket? And now, the vendors, Microsoft included, want to include a native XML data type to make this stuffing...what? more functional?

    That is what the relational model is all about: identifying and constraining the elements and relations of the sets of data. XML can not even begin to come close to addressing these storage, constraint, quality, and retrieval requirements.

    Make no mistake about it: many would be developers without any formal training, and many who do, want to transport these XML documents directly into the DBMS. Once this happens, not only will the XML not be able to fulfill the requirements of a flexible, reusable, data model, the DBMS will be rendered useless to this task as well. All knowledge of the set elements of the data will have been consumed and badly digested into the XML stream stored either as a text BLOB/CLOB or some idiotic XML data type.

    The very idea of an XML data type is an oxymoron. Data types are one piece of the DOMAIN construct who's sole purpose is to constrain and add quality to the data storage/management. How is XML constrained? A schema? But that's a DBMS. How thick can some of you people be?

    Time to go to class, grow up, and get a real education. These do-it-yourself in 21-day books you've bought are doing us all a disservice.

    Edited by - iapetus on 10/24/2003 10:27:39 AM

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply