Is XML the Answer?

  • I got an email from someone pointing out that some of the arguments I made about XML are very much like those made against 3GL programming when it was new.  "It's bloated, and ineficient, etc..."  He pointed out that while the arguments are true on their face, they ultimately lost out because of the other benefits that modern programming languages brought to the table and many of the problems were more problems of improper use and lack of standards rather than any inherent weakness in 3GL programming...  My response is below.

    I see your point, to a point. There are some similarities in both the pro and con arguments when the dialogue is limited to "bloat", but that is actually the least important of the arguments against XML. It is the easiest to demonstrate and to understand, and it is not unimportant, but main problem with XML is that it is hierarchical. As such it is not suited to data management at all, and it has some pretty severe limitations for data transport. XML actually imposes a greater burden on communication than is otherwise necessary.

    Normally when you want to send/receive data both the sender and receiver must agree on what the data means and a format. XML requires both of those elements but it actually imposes another, much more subtle (more difficult) requirement that is summed up by saying: Hierarchies are not neutral.

    When you arrange anything in a hierarchy you are--by definition--declaring that some attributes are more "important" than others. The problem is that while that may be true for one purpose, it is not likely to hold up as the data is shared among people and organizations with different priorities. Let me demonstrate. Let's take a music collection. Although it's common to see music categorized by "Genre" and then "Artist", but since so many artists cross over genre's, wouldn't it make sense to put Artist first? What about "Album Name", "Year Released", "Record Label", or even something like "Playing time"? The answer is a definite maybe, it depends on how you want to search, but that is the problem right there. Your data is now organized to facilitate one known use, but it is very clumsy to use in any other way. Or what if it doesn't really fit into a single category? In these cases it often becomes a "coin toss" or more often a political tug of war. You can't organize the data to support multiple uses without duplication. This is why hierarchical databases were so notoriously difficult to build.

    True hierarchies do exist in the real world but they are much less common than many OO programmers seem to realize. The OO programming paradigm has not solved any of the problems it was sold as cure for, in large part (I think) because it is too strongly hierarchical. This tends to impose object models that are far to inflexible and complex and then when it comes time to interface with the data layer, OO programmers complain that the data model doesn’t mirror the object model that took so long to agree upon.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • As far as I can tell the advantages of XML are:

    • It can get through port 80 in a firewall without scaring network admins
    • It is geek readable so where the originator couldn't be bothered to document it at least you can guess what it is

    A purpose built binary format will always be quicker lighter and faster but the problem is how to expose Company A's proprietary binary format to Company B?

    I hate XML with a passion but I just don't see a viable alternative

  • "I hate XML with a passion but I just don't see a viable alternative"

    Like I said above, YAML (http://www.yaml.org) seems like a decent alternative for data transfer, .ini files, etc.

  • David Poole's response (two responses above) is spot-on and funny as hell. Brilliant observation !! XML carries all the logical bagage of EDI ... You still have to get on the phone and work out the protocol with the other guys. IT people tend to have a strange sense that the world is a perfect place and that a perfect protocol will solve all IT problems ...

    This twisted view is usually lodged with an analyst/techo geek that never ever stayed up all night at Bank America fixing a program that stumbled on the middle of an 8 million record transmission of data from the Fed because of a 3GL program on the input side that couldn't properly edit the data keyed in by a $9.00 an hour night proof clerk. OH . .and by the way, wrap XML around that much data and you wont have to stay up all night fixing it, it will be all night transmitting ...!

    David left out one other benefit of XML .. putting in on your resume will add 2 points to your qualification score in some of the automated resume scanning programs at big companies.


  • Like I said above, YAML (http://www.yaml.org) seems like a decent alternative for data transfer, .ini files, etc.


    Is is a decent substitution for XML as it's very human readable, but that's not usually the purpose of a data transmission format. Yaml appears to still have the overhead of at least one label per element throughout the data.  The basic concept of XML and YAML is fine but both have way too much overhead for effective high-speed transmission because of all the tags and labels.  If either were modified to have a preamble record that says "all of the data is in the following format" (hierarchical or otherwise) and the preamble appeared only once, then you'd be on to something for batch transfers of large amount of data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OK, I finally went out to yaml.org and took a look at the specification.  Here is a direct quote from the introduction:

    YAML™ (rhymes with "Camel") is a human-friendly, cross language, Unicode based data serialization language designed around the common native data structures of agile programming languages...

    There are hundreds of different languages for programming, but only a handful of languages for storing and transferring data. Even though its potential is virtually boundless, YAML was specifically created to work well for common use cases such as: configuration files, log files, interprocess messaging, cross-language data sharing, object persistence and debugging of complex data structures. When data is easy to view and understand, programming becomes a simpler task. (Emphasis added)

    By the highlighted portions of the quote it is painfully evident that the YAML folks know nothing more about data managment than those who came up with XML. 

    They are advocating for data "persistence" to mirror the data structures of "agile" programming languages. 

    Translation: Since our stupid programming languages are stuck in the trees with the monkeys, and we can't seem to think outside of trees, let's go back to the good old days when our persistence mechanism was tree-based too!

    Of course YAML's potential is "virtually boundless" and since it is our hammer, the world MUST be made of nails!

    Granted YAML does seem to address the XML bloat to some degree, but of course that is the least important, but most obvious criticism of XML.  Of course these guys either don't get or want to ignore the more important arguments...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • You're absolutely right, it does only address the bloat issue of XML (which is a major issue, the one issue that gets it shot down most of the time). However, lots of time, for political/coolness/no good reason, you have to send metadata along with your data. In that case, YAML is better than XML languages.

    That said, sometimes there are very good reasons to send along metadata with your data. For example, in the clinical trial world, there is such variation from one trial to another that there is no way to come up with any kind of meaningful standards from one trial to another. In that case, you do want to transfer your metadata along with your data.

    And don't forget, even if you are forced to use XML languages or YAML, there is no reason why you can't use relational structures. That's how I coped when idiots forced me to use XML.

  • Thank you, Don, for this article.  I started looking at XML casually a few years before you wrote this and was immediately struck by the bloat.  I experimented with it a bit with exporting tables into XML and just couldn't see how this could be a solution to much of anything.  If you need to transfer data between disparate systems, print out a schema and make it some sort of delimited format: we've been moving data like that for years with mostly total success.  There may be issues with embedded objects or peculiar data types, but those won't transfer easily regardless.

    It just wasn't a solution to any problems that I was aware of.

    Anyway, I'm forwarding this to some of my co-workers.  I don't know of any pending XML projects on our horizon, but I'm only in my fourth week of my new gig.  Let's just hope that if that TLA ever rears its ugly head that we can continue to nip it in the bud!

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Hey!  I used to be able to hold up an 80 column punch card to the light and read it!  I was proud of that former ability!

     

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Well now, it has been 4 years since this discussion. XML's foothold continues to expand.

    Do the comments and concerns here still hold up?

    Has anyone changed their stance? or Are you more confident in your initial assumptions?

    There is a little bit of a troll here as I continue to find XML a frustrating as I initially believed. But nonetheless I think it is important to continually reevaluate. Also, it is a mental distraction as I take a break from the tedious task of trying to extract some tables into a MISMO compiant XML file. (This task alone has convinced me that XML as a transport layer is not worth it. The effort to build a compliant file in the correct layout is turning into a monumental task).


    Kindest Regards,

    Frederick Goodrum

  • It's what I call the "technology of the future" that will always remain so. It's mostly just used for trivial purposes.

  • Yes... I've changed my stance on XML, Frederick.  I hate it even more simply because of it's similarity to bloat-ware... but, the folks in OPS just love what it does to band-width usage when trying to transmit a gig of raw data only to find that XML is going to take 16 gig for the same thing and gives them justification to increase their hardware and dedicated line budgets.  Our vendors love it because it gives them the excuse to sell more harddisk space and faster switches and routers.  And our night operators love it because it gives them something to watch all night instead of the quick little BCP and Bulk Inserts that we used to do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Both of you have put a smile onf my face. I'm reminded of a companies tagline The future - Today.

    I find it fascinating that several years ago when showing concern over the bloat, the response was that disk space was 'cheap' and bandwidth will continue to exceed capacity. A small pittance to pay for readbility.

    So I troll with more questions. Are the endusers happier now that they can read the data?

    I say trolling not out of a need to upset or cause trouble. I am simply having a rough time since 'biting the bullet' and I suppose I am looking for kindred spirits. For 20 years, I've slapped together interprogram comm utilities in COBOL, then C, then C++, Then VB, Now C#. All the while the data layer has been a no brainer. The coding was inconsequential. The heavy lifting was typically performed working with the business user mapping business processes and their associated data elements. That is all changed.

    I use the real estate and mortgage XML 'standard' as my reference. The MISMO DTD is massively complex. To the point of making the business process (which was a large complex business process in itself) an utterly monnumental task. Perhaps this is not the pristine example people wish to use to demonstrate. But, it is a real world task. Now after spending/wasting four days just trying to create a compliant mortgage application XML file, I am seriously considering starting an all out campaign for the destruction of XML. At least as a common use concept (not tool).

    I love the demonstrations,the books, manuals, and the exquisite examples showing how easy it is to use. What I've seen so far are additional programs to purchase (some quite expensive) that provide additional layers of work for me and lot's of additional failure points. And still someone forgot to tell me that the 634 assorted nested data elements have to be mapped from my SQL tables to a structure that, quite frankly, boggles my mind. Now, when I do get my head wrapped around most of the concepts, The element names range up to 40 characters in length. All uppercase with underscores as make believe spaces. Thats a lot of typing for something that is already typed! <groan>.

     

    Sorry to vent and I apologize sucking you into my little world. But today I continue to ask. It is now been over 4 years since we have been given this godsend. To me it looks like this thread, unfortnately, has taken a back seat to 'blind acceptance'. The naysayers stand firm in their conviction. Those who heralded the great awakening have moved on to greener pastures. --- Yes Virginia, the Circus has left town and all we have is a broom.

     

     


    Kindest Regards,

    Frederick Goodrum

  • I wrote an article on this very subject a while back:

    http://www.sqlservercentral.com/columnists/shirsch/whatisxml.asp

    It hasn't been 4 years, it's been 10 years since XML was created.

    Among the myriad problems with using XML languages, you hit on a major one: the hierarchical data model. While we think in hierarchies, the relational model is much more practical.

    Also, it seems as if the data modelers went wild with your schema. That's a problem that's independent of an XML or any markup language, but is one that is encouraged by XML.

    Some data modelers, in their quest to be accurate, don't seem to think of the people who would actually use their model.

  • I've managed to avoid XML thus far, but it's lurking around the corner for me. I've got two laboratory database systems that apparently I'll have to use XML to move data from one to the other. I'm looking forward to working with something different, but I'm not really looking forward to it being XML.

    I think that my problem is (a) the bloat and (b) the redundancy of definition. If I'm moving data back and forth with you, we're going to know what each other's schema and requirements are, and we can make adjustments to file formats to accommodate. I know what I'm sending, you know what you're receiving. So why do I have to label every frelling single data element that's going across?

    Maybe I'm just becoming more of a curmudgeon the longer that I stay in this biz, I just don't see the win here. Give me a columnar or CSV file any day!

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Viewing 15 posts - 91 through 105 (of 144 total)

You must be logged in to reply to this topic. Login to reply