What is XML?

  • Very good article!  Although there's a point I don't agree with:  XML is a markup language technology (as it's name clearly states), and as such, it can be used for anything we want (yes, including data storage/transfer/formatting/etc... just as HTML).

    XML is not the problem.  The real problem begin with us (developers), trying to use it for everything.  As an old saying goes: "With a hammer in my hand, every problem looks like a nail".

    IMO, XML has its place somewhere... it's up to us to find such place, that is most likely different for each one of us.  Try to see if the nail has some markings on its head before you start hammering away.  Yes, a hammer will drive the screw in any way, but maybe a screwdriver can get the job done faster and without so much effort.

  • Hello,

    I have not waited XML to put a word document between two ###VERYLONGDOCUMENT### and tell my remote client to use these tags to decode. XML is an xtremely marketed language (add some XML stuff in your resume), I can not still see where the revolution is.

    I have not see any developer using DTD to handle dynamically different document formats.

    In most of cases, they will state together what will be the XML structure to have their program running well : So, what is the difference with a csv file ?

  • "How do you create or consume XML from within your programs?"

    First of all, let's ban the word "consume" from any discussion of data . That includes "web services"

    Oracle has very nice conversion functionality between XML and its relational model. I'd look there first.

    "XML is a markup language technology (as it's name clearly states), and as such, it can be used for anything we want (yes, including data storage/transfer/formatting/etc... just as HTML)"

    No argument there. My point is that XML languages (I feel like the GNU/Linux guy) don't do anything but markup text well. I can buy a car with dimes. Why would I want to?

  • Just to throw some balance to the discussion, let me be the lonely voice of dissent.

    What the author says about XML could be said about almost any other technology available to store/share data. There is no universal solution that can be adapted to every case. For simple cases, an old fashioned text file is more than enough, but as the complexity increases so does the solution. Files used by a single application can be whatever a programmer wants and they usually work fine, but as applications begin to interact with one another the simple solutions show their limitations. Interaction implies well defined interfaces, and XML provides an great way to describe and publish an interface. Not the only way, for sure, but it has distinctive advantages over other ways like COM, IDL, header files or EDI. If you program in an isolated environment this is a non issue, but if you need to write programs that consume data from vastly different sources, you need some standard. Enter XML. The wide adoption of this format has made it the de-facto standard, and that's something GOOD if you are in the business of data publishing/consumption.

    By claiming that XML is overly complex, the author is ignoring one of its key features, which is the fact that there is a standard, and this fact has made possible the development of many tools and language constructs (most of them freely available) designed specifically to deal with all the complexities of XML. This is what makes XML a good choice for many projects (notice I don't say "for every project"). As an example, compare with the still popular comma delimited file: If you are processing information sent in that format, you either trust each line has the same number of fields or write code to deal with exceptions. Same for the actual data: if the third field has to be a date, you have to write your own code to check for it. In general, you end up writing code that is custom made for a particular file, and if the the file format changes, you usually need to change your code. Compare that with the XML solution: once you agree on a particular schema (including a DTD), all the grunt work is already coded in most XML frameworks, you only worry about processing the data.

    Maybe in the old days XML had to be manually processed and validated, but we've come a long way, and nowadays most XML traveling the net is never seen directly by any user, it's just processed by highly optimized code that works precisely because for once we have a format with almost universal acceptance. I don't see anything wrong with that.

    If you don't like XML, you are free to use whatever format you like, but if you need to interchange information with the rest of the world, don't expect everyone to adapt to your own format. In fact, just about everybody already adapted to XML, so why not use that to your advantage?.

  • "No argument there. My point is that XML languages (I feel like the GNU/Linux guy) don't do anything but markup text well. I can buy a car with dimes. Why would I want to?"

    That's a good point.  But still, a tool is a tool, and as such, you can't expect it to do the work for you, nor it will make it better if you don't use it properly.  Think of it as two different car dealers.  Each of them has a sign out front.  The first one reads: "Mustang for sale: $100.000".  The second one reads "Mustang for sale: $100.000.  5% Discount if you pay with dimes".  My point is that the "Why would I want to?" part can have a lot of explanations.  But it's still up to us to choose one.  In this particular case, I would put $5.000 against my laziness to gather the dimes.  Tuff call...

    Yes, XML does nothing more than formatting text well.  But, if that's what I need, it just might worth the shot.  Why not?  Everyone else seems be using it!  (just kidding... )

  • >>I have not see any developer using DTD to handle dynamically different document formats.

    Just beause you haven't seen it doesn't mean it's not good or it's not being used by others. In fact I have used it in a way that makes maintenance of my applications a lot easier than before. I process data from multiple vendors and they of course change their formats frequently. Having code that will dynamically adjust to a change in the XML schema has been great for me. But of course, beauty is in the eye of the beholder, so I'm not claiming everyone should be doing that. In the end XML is one more tool that happens to work for many of us.

  • "Compare that with the XML solution: once you agree on a particular schema (including a DTD), all the grunt work is already coded in most XML frameworks, you only worry about processing the data."

    If you agree on a data model, that's 95% of the effort . Sam, may I ask, what are you using XML languages for? How complicated are your data models? There's no question that with enough effort, you can get XML to work; it's just, can you get it to work well?

    BTW, how in the world do you do dynamic data models in XML? Inquiring minds wanna know!

  • I am underwhelmed by this article, and not just because it's dated. (DTDs?) Most of the problems the author attributes to XML are in fact problems associated with the application domain that he's trying to apply XML to. It's hard to get the pharmaceutical industry to define and adopt a common data model? That has *nothing to do* with XML.

    Similarly, the complaint that XML isn't inherently readable, and that you have to write XSLT to convert it to HTML for presentation, is muddle-headed. What format for representing structured data in a transportable form *would* be human-readable? CSV? JSON?

    The complaint that XML is hierarchical is also bogus. It's perfectly straightforward to represent data relationally in an XML document, if you have to. It's just that for most uses of XML, where an XML document is a concise and transportable bag of information that can be parsed into an object model, representing relationships hierarchically is easier. Since object models represent hierarchies, and they represent them because the hierarchies actually exist in the real world that the objects are modeling, a far more apposite complaint would be that relational databases do a poor job of modeling hierarchies. If you're going to get into this at all, which you shouldn't.

    This article is primarily complaining about the hardness of certain problems and then castigating XML for not magically solving them.

    A *good* article about XML would have addressed the simple question that kagemaru asked:

    > In most of cases, they will state together what will be the

    > XML structure to have their program running well : So, what

    > is the difference with a csv file ?

    Well, what *are* the differences between representing data in XML instead of CSV? Here's a couple off the top of my head:

    - XML provides a good way of representing hierarchies. CSV doesn't.

    - XML documents can be reliably parsed into an object model that can be programmatically manipulated; this object model is implemented in a standard way on many different platforms. Not so for CSV.

    - A powerful (if daunting to learn) platform-independent language for searching through XML-based object models exists. Not so for CSV.

    - Standard, platform-independent tools for defining the format and organization of XML doucuments and validating that a given document contains what it's supposed to contain exist. Not so for CSV.

    - Standard, platform-independent (are you detecting a theme?) tools for transforming data in XML documents into other formats (in particular, into HTML, for presentation) exist. Not so for CSV.

    - There's a consistent way of programmatically verifying that the data in an XML document hasn't been corrupted or truncated in transit. Not so for CSV.

    - XML documents tell you their character encoding. CSV documents don't.

    Now XML has plenty of drawbacks. Above all, it's not terse. This drives people crazy: all those redundant closing tags and repeated attribute tags. That's part of the cost that you pay for representing your data in a transportable form that can be programmatically validated and transformed by platform-independent tools. If you don't need those things, you shouldn't be using XML.

    Also, namespaces and CDATA are hard to understand. I'd contend that if you're using XML and you don't understand namespaces, you probably don't really understand what XML is yet and you probably shouldn't be using it.

    Finally, since any clown can write an XML document, lots of clowns do. It's a general-purpose format for representing data, and as such, it lets you make all kinds of stupid mistakes if you don't know what you're doing. Pick up just about any "introducing XML" book and you'll find an inadvertent compendium of worst practices.

    XML is a technology that addresses a lot of hard problems better than any other technology we have. If you don't need those problems solved (as the author of this article clearly doesn't think he does), you shouldn't be using XML.

    Robert Rossney

    rbr@well.com

  • "This article is primarily complaining about the hardness of certain problems and then castigating XML for not magically solving them."

    That was precisely the point of the article. XML is (often) sold as magic, when it is anything but. I go into a few of the problems with XML in the article, but it is mostly a rant against the hype.

    I wrote (in 1999):

    "Data transfer standards are fine, and XML is as good a choice as any. Just don't think that it won't take its pound of flesh, just like every other technology known. My guess is that it will be one of those technologies of the future that always remain so."

    I leave it to you whether I was a prophet .

  • >> If you agree on a data model, that's 95% of the effort.

    Actually, that statement is true when the data model agreed upon is XML, because once that's done, the data processing is very simple. In some cases though, if the data model is not XML, the complexitiy of processing the data may take a big chunk out of the developing time. For an extreme example, think about a Java shop receiving Excel files from a Microsoft shop. Regardless of this, however, my main point is that by adopting XML as a standard, everybody concentrates on the data itself and the processing is left to the already highly optimized (and free) tools that are available in all platforms. It's a win win situation.

    >>Sam, may I ask, what are you using XML languages for? How complicated are your data models?

    I use XML to process information we receive from several vendors on health care providers and fees. The data models are very simple, nothing fancy here, but we were having a hell of a hard time trying to adapt to all the different formats the vendors were using (csv, excel, access, fixed width text files, etc). We convinced most of the vendors (not all, it never works that way) to switch to XML and it's been great for the most part: Our code can detect if a new field was added or deleted from the schema. If deleted, it will be ignored, if added, a new column willl be added to the appropriate table with the appropriate data type, and then the file will be imported. Of course, we know that the changes to the schema are never too large, it's usually a couple of fields added or deleted, I'm not claiming this is something that can be used to handle ANY schema regardless (it would result in a rather ugly looking database table). Most of the effort goes into changing existing stored procs and reports to add the new column where appropriate (or delete it if necessary). But the main point is that we seldom have to deal with the data directly. Before XML, changes to a file's layout usually required recoding of our applications AND our stored procs/reports.

    >>BTW, how in the world do you do dynamic data models in XML? Inquiring minds wanna know!

    I don't know what you mean exactly by a dynamic data model, but we are not doing anything fancy. The files we get come with their own schemas, but we first validate agains the old schema. If the validation succeeds, we move to the next stage to process the data, if not we determine the differences by comparing the new and old schemas and proceed as explained above. If this sound too simple, it's because it really is, and that's exactly my point, by using XML this type of tasks have become rather mundane.

    Not to say that everything XML is simple or that everything non-XML is more complicated than it should be, but there is definitely a reason for all the XML hype. 

  • "Actually, that statement is true when the data model agreed upon is XML, because once that's done, the data processing is very simple."

    Data models are independent of their representation. When you create an XML language (remember, XML itself is nothing), then you have access to parsing tools and format conversion tools. That's not nothing, but nothing to write home about, IMHO.

    "In some cases though, if the data model is not XML, the complexitiy of processing the data may take a big chunk out of the developing time. For an extreme example, think about a Java shop receiving Excel files from a Microsoft shop. Regardless of this, however, my main point is that by adopting XML as a standard, everybody concentrates on the data itself and the processing is left to the already highly optimized (and free) tools that are available in all platforms. It's a win win situation."

    Let me give you an XML<=>XML nightmare. Let's say you need to convert Tbook to Docbook. You can write XSL to do that, but it would be an onerous task. The fact that both are XML isn't much help.

    Unless, of course, someone has already built it for you.

    "I use XML to process information we receive from several vendors on health care providers and fees. The data models are very simple, nothing fancy here, but we were having a hell of a hard time trying to adapt to all the different formats the vendors were using (csv, excel, access, fixed width text files, etc). We convinced most of the vendors (not all, it never works that way) to switch to XML and it's been great for the most part: Our code can detect if a new field was added or deleted from the schema. If deleted, it will be ignored, if added, a new column willl be added to the appropriate table with the appropriate data type, and then the file will be imported. Of course, we know that the changes to the schema are never too large, it's usually a couple of fields added or deleted, I'm not claiming this is something that can be used to handle ANY schema regardless (it would result in a rather ugly looking database table). Most of the effort goes into changing existing stored procs and reports to add the new column where appropriate (or delete it if necessary). But the main point is that we seldom have to deal with the data directly. Before XML, changes to a file's layout usually required recoding of our applications AND our stored procs/reports."

    From what I gather from your response, you do have to do a lot of recoding when the schema changes. You've standardized on a data transfer format, which is good, but you still have lots of work to do. Using XML saved you some, I guess.

    "I don't know what you mean exactly by a dynamic data model, but we are not doing anything fancy. The files we get come with their own schemas, but we first validate agains the old schema. If the validation succeeds, we move to the next stage to process the data, if not we determine the differences by comparing the new and old schemas and proceed as explained above. If this sound too simple, it's because it really is, and that's exactly my point, by using XML this type of tasks have become rather mundane."

    I did something similar with SAS Version 5 XPT format datasets. The absolute worst data format in the world. However, there were many parsing and conversion tools already written, which also made the conversion/parsing tasks trivial.

    In other words, you don't need XML to accomplish that. And XML costs alot.

  • Consuming XML isn't hard, it's digesting it and it makes my eyes water to parse it.

    I've used it for what it is supposed to be used for. Separating content from presentation.

    When it is used for what it was originally intended it works.

    I'm just waiting for a gently lobbed D.C.Peterson handgrenade.

  • "Let me give you an XML<=>XML nightmare. Let's say you need to convert Tbook to Docbook. You can write XSL to do that, but it would be an onerous task. The fact that both are XML isn't much help.Unless, of course, someone has already built it for you."

    What's your point? Name ANY technology, past present or even future, and I will show you a situation where using that technology would be an absolute nightmare. Of course there are situations where XML is not appropriate. But there are many where it's just what the doctor ordered.

    "From what I gather from your response, you do have to do a lot of recoding when the schema changes. You've standardized on a data transfer format, which is good, but you still have lots of work to do. Using XML saved you some, I guess."

    And that's exactly my point. The amount of recoding was reduced substantially by moving to XML.

    "I did something similar with SAS Version 5 XPT format datasets. The absolute worst data format in the world. However, there were many parsing and conversion tools already written, which also made the conversion/parsing tasks trivial. In other words, you don't need XML to accomplish that."

    Nobody is claiming that you need XML for EVERYTHING. But you must admit that for a large number of tasks XML makes life easier for everybody. The same way you used existing tools to make the conversion/parsing trivial, a lot of people are using xml tools in a way that makes their programs a lot easier than before, sometimes trivial. You are ignoring the great advantage that comes from adopting a format that's shared by many. Could it be better? maybe, but that's not the point, the fact is: XML has been adopted by many and it works, so why not use it? Do you have an alternative? What do you suggest we use to exchange information? Say you came up with a clever data model and an amazingly powerful compression algorithm to send data over the net. Who would consume your data? You would have to provide API's for different platforms in order to enable others to manipulate your new format programatically. And of course, you would have to convince everybody that your way is "better", existing programs would have to be modified, there would be a need for specialized tools to handle special cases, etc, etc. That was the situation when XML came out, and it won hands down over other alternatives, why do you think that happened? Because the majority of programmers saw the utility of the new format and adopted it. Until someone comes up with a better way (and someone will, for sure), XML is here to stay.

    "And XML costs alot."

    I don't know what you mean. Does it cost a lot in terms of development time? bandwith? learning curve? For each of these, the answer is, as usual: depends on what you want to achieve. It's the same with hardware: A $500 graphics card by itself is neither cheap nor expensive, it depends on what you want it for: if it's for your grandma's computer so she can use aol, you are paying too much, but if its for your latest and greatest gaming rig, it may be the best part of your system.

  • Just remember: XML is not a format. It is a format format, akin to a cookie cutter. Cookie cutters usually don't taste very good .

    I'd say that the costs of XML languages come in two places: one, lots of people, myself included, find it incredibly difficult to debug, particularly when the data model contains several layers of hierarchy. Two, it is a space hog. In an app I designed, I found that there was a 43x increase in data transmission size over a CSV equivalent file. That killed it right there.

    That said, if you've got it to work well, then Mazel Tov! Again, the article was mostly a rant against hype. The kind of hype that gets the management types to insist on requiring XML without a clue as to what they are going to do with it once they've got it.

  • Sam C,

    Sounds like XML would be helpful to me--I also process data on health care providers and fees from several sources. What are the tools or frameworks I should become familiar with? (I use SQL Server and Powerbuilder, but could use scripting languages, too).

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply