Is XML the Answer?

  • JJ B (5/2/2008)


    I certainly would rather use CSV for transfering data/communicating between databases/servers any time just because it is a million times easier and faster to create, human-read, and code-read.

    So, I use CSV when I can.

    BUT how would you solve the following problem without XML?

    ............You have to send data to another agency. The data is not only completely outside your domain, but will be put into another database tool (Sybase). Most of the data would work great for a CSV format. However, one of the fields is a Varchar(MAX) field. Data in this field ranges anywhere from 1 page to 20 pages of text, all fully punctuated with every imaginable type of character.

    Could you really use CSV for this? What else would you use? I considered creating a separate text file for each row of each Varchar(MAX) field, but the programming effort of managing all those files seemed to be a lot more than the programming effort of XML, which seemed to easily handle this requirement.

    What would you do? (Honest question.)

    As someone who deals with this every day, I can tell you that XML is almost there in ease of use, but when it gets there, it'll be a wonderful solution to my problems.

    I agree and disagree with the author here. Is it bloated compared to other ways of storing data? Yes. Is it trying to do too much, be all things to all people? Maybe. But please don't discount what it does do, because it does do it well.

    My department of my company is a data shop. We get data in, analyze it, and send it out all the time. Most of the time we get flat files (CSV or other delimited), and let me tell you, finding the rogue commas in data fields to fix data import is no fun task. Also, as the above author wrote, how do you then send carriage returns, commas, pipes, etc? Big deal for us. (and don't forget cross-platform issues)

    The other wonderful thing about XML is it basically forces a level of metadata documentation. If you don't think that's a a big deal, you haven't worked at a small company at the mercy of a larger company for data feeds. A large data feed shop really could care less if they lose our business, but we absolutely need theirs. So if we complain too much about needing more information, they either a) ignore it, or b) label us too much trouble and walk away. But with XML, I know that at least I'm getting field names and a relationship representation also.

    I've only found one situation where I used an XML field type purposefully in an application situation (describing metadata; it worked well), but one application that maybe you haven't considered is archival. If I send a 3rd-party an XML file, I can keep an archive of it in an XML field in a table. Yes, I could just store that in a folder on the network, but this way, I can easily query it from SQL Server directly to run stats of what we've submitted. Very useful.

  • Stephan: Thanks for the reply. If I understand you correctly, you are suggesting that I could create a delimited text file, but instead of using a comma, I could use the grave character? Or maybe you are saying this: the narrative should be in its own file where each narrative record/table row is separated by the grave character?

    I've never heard of the grave character. I'm going to look into it. I'm sure my peers in the other agency would appreciate an alternative to reading my XML files.

    My biggest concern for using the grave technique would be: I know that most database products know how to easily read CSV files. Similarly, there are standard methods for reading XML files, even if it is a pain. Would a database product have built-in (or at least easy coding) support for reading a file where "rows" are separated by the grave character? If lots of special code had to be written to read grave files, I'm not sure the grave solution would any better than the XML files.

    Just some thoughts. Thanks for your thought.

  • "If I understand you correctly, you are suggesting that I could create a delimited text file, but instead of using a comma, I could use the grave character? "

    Yes.

    "Or maybe you are saying this: the narrative should be in its own file where each narrative record/table row is separated by the grave character?"

    Not quite. Maybe the narrative could be in a file structured like PK`Narrative?

    "I've never heard of the grave character. I'm going to look into it. I'm sure my peers in the other agency would appreciate an alternative to reading my XML files."

    Sure. It's the lowercase key next to the "1" on US keyboards (other keyboards, YMMV)

    "My biggest concern for using the grave technique would be: I know that most database products know how to easily read CSV files."

    Most DB products will let you select your own delimiter. Just select ` instead of , ; | etc.

    "Similarly, there are standard methods for reading XML files, even if it is a pain. Would a database product have built-in (or at least easy coding) support for reading a file where "rows" are separated by the grave character? If lots of special code had to be written to read grave files, I'm not sure the grave solution would any better than the XML files."

    If it's at all decent, then yes.

  • Stephen: Thanks for the additional clarification. That helps.

  • From my experience, XML is more or less document-centric.

    It allows easy validation and "ETL" from a document using XSLT.

    If a "document" is only rows & columns, then XML is probably overkill.

    But if it is hierarchical or at least more complex than rows & columns, it

    can be beneficial.

    I also think that the main benefit is not in the database, it is in the

    handling of the data within an application (ever tried parsing a complex

    text file?).

    Extending XML to the database is just a natural evolution.

    Why not store such documents directly in the database? You need to

    consider performance, that's true, but is this really the key factor for

    your specific application? It always depends.

    What I don't consider a use of XML is data exchange in the form of "batch" ETL

    from sytem to system. In contrast, for "document centric" ETL I would consider

    the use of XML but considering doesn't mean also using it in the end.

    Best Regards,

    Chris Büttner

  • Klaar Chris, klaar. My point of view exactly.

    In English "mark up" is something you do to documents, not databases.

  • Stephen Hirsch (5/2/2008)


    ...

    "I've never heard of the grave character. I'm going to look into it. I'm sure my peers in the other agency would appreciate an alternative to reading my XML files."

    Sure. It's the lowercase key next to the "1" on US keyboards (other keyboards, YMMV)

    It's also the unshifted tilde (~), unshifted it's the accent grave. IIRC, it's used in *nix BASH scripting, but I don't remember what for, but apparently it has quite a number of computer uses: http://en.wikipedia.org/wiki/Grave_accent#Computer-related. 😛

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Sure, in Unix it does. I meant lexically, like in human languages.

    If you're putting Unix commands into the data, then yes, you'd have to find something else. I don't think that that would occur very often, though.

  • The place I work now use XML, a LOT. And a lot of the time in the wrong ways. Luckily we have finally agreed on a standard: Relational data in the database and as results, unless it can be proven that XML is the right/best way to do it. I.e. it is up to the developer in question to convince us during peer design reviews that he needs to use XML.

  • By creating an incredibly bigoted paper, you lost your credibility. When one chooses only that which supports their predetermined conclusion, they do an injustice to a good analysis.

    If one properly chooses selected components of RDBMS, one can make exactly the same kind of claims. For example a RDBMS is not a good way to send information across the internet. Talk about a large footprint. If this is all one considers, one could conclude that RDBMS is something one should avoid.

    BTW do you use Vista? If so then you are heavily invested in XML as it provides the underlying communications for Vista. I don't see a performance penalty. Vista appears to work pretty well.

    You have some good points but you've selectively chosen those things at which XML is not good and reinterpreted others to fit your conclusion.

    I'm amazed this article was printed twice.

    So something so bad performs extremely well in proper real use.

Viewing 10 posts - 136 through 144 (of 144 total)

You must be logged in to reply to this topic. Login to reply