Is XML the Answer?

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/dpeterson/isxmltheanswer.asp

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Nice to see someone playing devil's advocate.

    I'm not sure that I totally agree although I have very limited XML exposure.

    My understanding is that the rules about what an XML file can and cannot contain can be restricted via a schema. Provided you don't change the rules within a schema then you are OK. The nearest programming equivalent of "Don't change your schema" is "Don't change your class interfaces".

    Because an XML document is usually a text file it can be read by almost any OS. Yes, other formats are faster but that limits cross platform compatibility.

    XML is designed to be used over the web. My understanding is also that XML is an attempt to separate web content from web presentation. If the document is dictated by the Schema then I could take your XML document and apply an XSLT to it to display your document in a number of completely different layouts.

    I have to say that initially I was underwhelmed by XML. I thought it looked like the web equivalent of a COBOL data-division!

    Hierarchal data is a pain to manage and administer but it's main advantage is speed. I've read that there are people out there who have databases so large that RDBMS technology struggles to perform all the overnight processing.

    Granted that for most people the RDBMS is the answer to a lot of questions but there are always people who need a better mouse trap!

    I'm nost saying that XML and XML databases are the answer but XML does have its uses.

  • I think you have missed one of the main benefits of XML. Its not an ideal storage medium, its not a lean storage medium either...

    Where it does excel is in transportability. We have a very major data store for the travel industry which we offer to customers via a XML interface. The reason we use an XML interface is that it is the only way we could easily distribute the content in a standard format across many channels.

    The data is a complex relational set and could not be distributed (In real time) any other way. We couldn't pass a series of CSV files for the related data, XML was the only way to go.

    XML is fantastic at what it does and has opened up a whole new area of data management and distribution for many companies, system can be developed that were just not feasible before.

    We do however store and manage our data in a fully relational SQL database.

    Dan

  • I think David has nailed the key point here.

    >Because an XML document is usually a text file it can be read by almost any OS.

    >>Yes, other formats are faster but that limits cross platform compatibility.

    XML has been offered up a standard format for presenting data. This being the case, the most important decider of its success is that the standard is widely -- dare I hope universally - adopted. In the case of XML, the major players have got behind this particular standard and, as far as I am aware, co-operated in a really positive way. In fact, it's about the only thing I see Microsoft/IBM and Sun all in agreement about. Microsoft gave support to the standards effort, releasing good neta parsers and documentation for free before the standard was ratified -- and then immediately made their release parsers compliant to the final published W3C standard -- removing some proprietary gizmos they had put in there before the standard was fixed.

    I use XML a lot and I like it. It's excellent for messaging, I am continually surprised at how powerful and versatile XSL-T is, and I consider it a very useful tool for a wide range of development problems. Sometimes there is an elegance you can't get in procedural programming.

    Human readibility of data documents *is* useful when you are a developer, even if the end-consumer (a computer) doesn't care, someone still has to make and debug the thing. The repeated tags in an XML document lend themselves very well to current compression techniques, and through schemas we have a *standard* way of validating data documents/messages that we don't have for flatfiles.

    Sure XML is just not useful for high volume/long term storage and retrieval -- but we have databases for that. It's a bit like writing off SQL Server because the games aren't as good as on the playstation.

  • My role in the company I work for is that of both software engineer and DBA. That being the case, I had mixed feelings about the article - applauding some of the points made, wondering whether the wider issue was missed with others.

    One of the biggest struggles I have with the developers I work with is to get them to think relationally first, object oriented second. Similarly, because of the hierarchic nature of XML, I have the same struggle! I personally do not feel you can get the richness in meaning availble to RDBMSs in XML. Anybody who has tried mapping a many-to-many relationship in XML knows just what I mean. That, and I have a firm belief that good software flows naturally out of a good database design.

    So using XML to hack the relational model is b-a-d. However, using XML to map to a relational model is not necessarily evil.

    The argument that if tags change that everything breaks is true, however, why would the tags change? The same is true if you rename a field within a table or change its datatype. You just don't do it, or at least not without significant impact analysis.

    But there are many advantages to XML too. First and foremost, it's a standard. It has its faults (as pointed out in the article) but nonetheless, it is a standard. Developers have no need to reinvent the wheel on each project - the same toolset can be used across all projects to carry out the most standard of tasks. Standardised and generally bug-free too. Also, XML data feeds facilitate more meaningful code. You're getting the product's colour in your code rather than the first item in a CSV list. Anything that allows people to write precise and concise software that *works* is a gift.

    There is the argument that data processing should be distinct from data transport. That's generally because firstly we humans think better in terms of distinct layers, plus data transport rarely has problems, in comparison to data processing at least. But the fact is that no database exists in isolation. If one part of the system fails, the system is seen to fail as a whole, and the fault you refer to should be shared by all - not just the database guys, the programmers or their respective methodologies. It's the business failure that hurts, not the technical failure.

    I wonder why people do not worry about ADO or (especially) ADO.NET as they do XML? Both means require the transport of metadata as well the data itself, yet this is usually fine? Maybe if there were something more mystic about XML than just plain text people would have more belief in its capabilties.

    I had to smile at the "ignorance" claim, simply because I think we all know just what the author means! Why the hell do we need an XML data service on a biscuit tin anyway?

    I think the additions of "stupidity" and "greed" don't give a balanced representation of the vendors' position though. There are genuine benefits to vendors in terms of widening hardware and software capabilities whilst reducing implementation overheads. Take, for example, set-top boxes. These are capable of using the same ROM-based XML APIs for multiple purposes, using one communications protocol (HTTP) without the need to implement ODBC connectivity. This allows manufacturers to fit more data-rich services in the same storage space, minimising the quantity of service-specific data manipulation routines, and without storing drivers for every concievable database out there.

    I think the conclusion I draw is that XML data services using an RDBMS as a backend have their places and uses. They do not fulfil every need there's ever been, so no, don't believe the marketing hype. But don't dismiss it either - you could find yourself talking yourself out of a valid and efficient wider system, given the right context.

  • Hats off and a standing ovation to Don! You are so right in everything you write in the article. Some comments to the other comments you have received so far:

    quote:


    Hierarchal data is a pain to manage and administer but it's main advantage is speed. I've read that there are people out there who have databases so large that RDBMS technology struggles to perform all the overnight processing.


    First of all, the primary function of a database is not speed, but integrity. Second, the relational model can not be blaimed for inefficient implementations of it. Performance is a determined by the physical implementation of a database, the relational model is purely logical.

    quote:


    Because an XML document is usually a text file it can be read by almost any OS. Yes, other formats are faster but that limits cross platform compatibility.


    Why would this limit cross platform compatibility? Any platform can read a text file, no matter how the contents of it is formatted. XML in itself have absolutely no advantage to say a csv-file, it is the fact that it has become a de-facto standard that makes people choose xml over any other formatting. But, as Don points out in his article, it is definitely not an entirely wise choice, it has several major drawbacks. But this is just speaking about xml as a format for transferring data. Regarding data management and using xml as a data storage, there is absolutely no advantage at all over a relational database.

    quote:


    The data is a complex relational set and could not be distributed (In real time) any other way. We couldn't pass a series of CSV files for the related data, XML was the only way to go.


    How can a hierarchical xml file store complex relational sets?

    quote:


    I think the conclusion I draw is that XML data services using an RDBMS as a backend have their places and uses.


    Yes, you are right, I wouldn't argue that this is entirely wrong, if you stay with using xml as a data transport between applications. But I think that even though Don (and I agree with him fully) does not believe xml to be the definitive answer, what he is really saying in this article (especially since it is actually published at a SQL Server specific site) is that xml has no place inside SQL Server. And that is a very important message! We as developers and DBAs can do our part by not using these features and by continuing to request more important features and functionality, especially a better implementation of the relational model.

    I'll finish off by seconding the book recommandtions in the article. I guarantee you, when you've read these books you will not continue argumenting for xml.

    --

    Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

    http://www.sql.nu

  • quote:


    How can a hierarchical xml file store complex relational sets?


    Because it is doing what XML is best at... storing a single or selection of records and not an entire database. A hierarchical xml file lends itself very well to a correctly structured relational database.

    On a one level, one of our outputs relates data across 5-6 tables with various join types. The output of this is fantastic via XML, customer query our data via a web service (Which can also take the query via XML or traditional query string items) and we return the matching record(s) via a very easy to use XML document.

    There would be no easy way of doing this output without XML.

  • quote:


    quote:


    Hierarchal data is a pain to manage and administer but it's main advantage is speed. I've read that there are people out there who have databases so large that RDBMS technology struggles to perform all the overnight processing.


    First of all, the primary function of a database is not speed, but integrity. Second, the relational model can not be blaimed for inefficient implementations of it. Performance is a determined by the physical implementation of a database, the relational model is purely logical.


    I do agree with that.

    The best RDBMS system struggles with poorly implemented db design and application design. And that's not a lack of the RDBMS itself.

    The relational model is NOT designed to handle hierarchies after all. In fact, it is the successor of the hierarchical model. Designed to get rid of the limitations of the hierarchical database model. It might be true, that the network model or the hierarchical model are faster than the relational, but as Chris has said that is not main focus of a relational db.

    As for XML, a classical case of 'It depends...and time will tell'

    IMHO, I don't use it, because I see no need to.

    Frank

    http://www.insidesql.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • >>Any platform can read a text file, no matter how the contents of it is formatted.

    true. but wouldn't it be great if there was a standard cross platform API for validating and querying such data from applications?

    >>XML in itself have absolutely no advantage to say a csv-file,

    see above.

    >>it is the fact that it has become a de-facto standard that makes

    >>people choose xml over any other formatting.

    You say that like it's not important --- it is the absolute clincher. HTML may not be perfect, but simply by gaining a critical mass of support it has completely revolutionised one type of client-server application (publishing for human readers). At it's simplest level XML can be seen as an attempt to standardise (edit) an interface onto (/edit) CSV files. This is an essential foundation on which we can build higher level standards of inter-computer messaging.

    Add to that the fact that, with XSL-T, a new generation of developers are considering the benefits of functional programming and I think it's worth standing up for 🙂

    >>what he is really saying in this article (especially since it is actually published at a SQL

    >> Server specific site) is that xml has no place inside SQL Server.

    Well, half the article is a (very flawed, in my opinion) attack on XML in the large. Only in the second half does it home in on SQL Server. I don't see anyone in this thread supporting the idea of storing data as native XML within an RDBMS -- and to be honest, I've never heard of support for this made anywhere else.

    However, outputting query results direct to XML seems very sensible and it's something I use often. I think all the major vendors now support it -- to a certain extent this maybe just bandwagonning, but as Frank says, time will tell.

    Edited by - planet115 on 10/07/2003 05:43:53 AM

  • It is very nice to see a viewpoint that is different from what all the pundits are touting. This is a very good article but it does miss a couple points.

    XML is more transportable because everyone doesn't have to agree on a single way to structure the data. If I get three XML feeds from three different sources and they contain the same data in a different format I can use XSLT to restructure them into the format my application expects. It's easier and faster (read as: more cost effective) to write a transformation when we get a new data feed than it is to write a bunch of procedural code and recompile my application.

    In regard to the "flawed" OO paradigm, the real flaw is the people that don't know how to convert an object model to a data model. The people designing databases that can't fit an OO application are the same ones that can't design a DB for a procedural app. They are also the ones that blame the database server when something doesn't work.

    OO and relational work perfectly well together. Use OO code to get the data in a structured format and then put it in a relational database in an appropriate format. You then have good data that can be used in the OO world and is great for set-based reporting.

    BTW, OO is certainly not the end-all be-all of software development. Procedural programming is far from dead and appropriate in many areas (Event-driven GUIs for example).

    The Information Technology arena has no room for extremists. Use the right paradigm/methodology/tool for the job.

    Bryant E. Byrd, MCDBA

    Sr. SQL Server DBA/Systems Analyst

    Intellithought, Inc.

    bbyrd@intellithought.com

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • First, many putting down XML have apparently never dealt with ANSI interface standards. If you think ANSI record layouts can not be helped by XML organizing the data then you are sadly mistaken. But again, this is a cross platform transportation model.

    I also find it somewhat amusing that people still believe relational databases are inheriently better than hierarchal. There's no one solution fits all. Have the OO arguements taught these people nothing?

    Self describing data is a joke. Even in relational database (column name) it's a joke. The end result is whoever is using the data needs to understand the data. Let's not make this more complex than it needs to be.

    XML does offer a "universal" way to pass (complex) related data. Delimited files do not offer this.

    This article begs of typical bad research. He took a point of view, then found examples to support that point of view.

    Lastly, the idea of a record being 4 times bigger, well it's all relative. If a record is 4 times bigger, but the small record is only 100 characters and the big record would then be 400 characters it doesn't matter on networks. They will both fit in a single ethernet packet and no additional network traffic will be had (I'm talking in a web oriented use here).

  • quote:


    It is very nice to see a viewpoint that is different from what all the pundits are touting. This is a very good article but it does miss a couple points.XML is more transportable because everyone doesn't have to agree on a single way to structure the data. If I get three XML feeds from three different sources and they contain the same data in a different format I can use XSLT to restructure them into the format my application expects. It's easier and faster (read as: more cost effective) to write a transformation when we get a new data feed than it is to write a bunch of procedural code and recompile my application.In regard to the "flawed" OO paradigm, the real flaw is the people that don't know how to convert an object model to a data model. The people designing databases that can't fit an OO application are the same ones that can't design a DB for a procedural app. They are also the ones that blame the database server when something doesn't work.OO and relational work perfectly well together. Use OO code to get the data in a structured format and then put it in a relational database in an appropriate format. You then have good data that can be used in the OO world and is great for set-based reporting.BTW, OO is certainly not the end-all be-all of software development. Procedural programming is far from dead and appropriate in many areas (Event-driven GUIs for example).The Information Technology arena has no room for extremists. Use the right paradigm/methodology/tool for the job.Bryant E. Byrd, MCDBASr. SQL Server DBA/Systems AnalystIntellithought, Inc.bbyrd@intellithought.com


    Thank you for your reply. In indicating that OO is "flawed" I did not intend to mean that it should be abandoned. However, the point still stands that OO is not based on sound, scientific principles. The relational model of data IS. Yet it is the OO pundits who seem to cry the loudest that there must be something wrong with the relational model and they are constantly touting something "new and improved" to replace it. I will not deny that the relational model MAY someday be replaced by something better, but that day hasn't arrived yet. By the way, I am well aware of the "benefit" of being a standard and address the issue in the article. XML is far from being a true standard and even if it becomes so, it will be a poor standard. I fully understand the history and impetus behind XML. I have put quite a bit of research into the subject over the past two years. What I learned is that XML was devised by those who have no understanding of data management fundamentals and their writings continue to prove it. XML as a data transport may be useful (if bloated and inefficient) but that is not the main problem I see with it. As I said in the article, the main problem is ignorant programmers who seem dead-set to take a 30 year leap backwards to use XML for data management. This is just plain stupid and there is really no excuse for it.You might call me an extremist. I suppose I have to plead guilty as charged. I believe that we, as data management professionals (not professional knob turners of any specific platform) have an absolute responsibiltiy to ensure the integrity of our company's data. I do not believe that there is room for compromise here. If at any point in the life of that data, integrity is compromised, you can't just magically impose it later. Further, I firmly believe that unless we keep our eye on the fundamentals of data management we will continue to waste vast amounts of money on that vendor merry-go-round. I, for one, am tired of being promised the world and delivered a stinking swamp!

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • quote:


    I also find it somewhat amusing that people still believe relational databases are inheriently better than hierarchal. There's no one solution fits all. Have the OO arguements taught these people nothing?


    Quite right, there are different solutions for different things. The relational model is the solution for database management. It was invented because the hierarchical model is flawed, not as a complement to it. The relational model is set-theory and logic applied to database management, what is xml?

    quote:


    Self describing data is a joke. Even in relational database (column name) it's a joke. The end result is whoever is using the data needs to understand the data. Let's not make this more complex than it needs to be.


    A relational database management system is not concerned with the names of columns, not more than that they are a part of the constraints that describes the database. The complete set of constraints for all tables in a database is what describes the database and is the only thing the DBMS can use to 'understand' the data. How does xml handle this?

    --

    Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

    http://www.sql.nu

  • this author has made a compelling argument and one I am excited to see written. I think it's time more people challenged the conventional wisdom that XML should be applied wherever possible, whether it adds value or not.

    I think like most technologies one must separate the good from the hype. With XML the "hype" slice of the pie seems to be proporationally bigger than with most technologies and I think a lot of people implementing it are doing it for the wrong reasons.

    Case in point - I have customers who call me to ask why I am not storing relational data in XML format in a SQL Server database for my SQLAudit product. Rather than try to reinvent the theory of a relational database I attempt to glean the source of the questiion and turn it around - "Why *would* I store relational data in XML format in SQL Server?" and I just get silence. In one case the answer was that Yukon was going to offer an XML datatype and it would "improve performance" ??? the source of these queries usually are traced back to some software vendors trying to jump on the XML hype bandwagon and leverage XML "buzzword" purely for marketing purposes. They interact with customers and put the XML bug in their brain and the next thing you know you are defending proper database design against designs that violate almost every basic rule of relational database modeling. Is it XML's fault that it's being used improperly - no - but it's up to people like this author to clarify it's proper role in the universe and point out that it isn't a panacea and often is the wrong choice.

    good job!

    Brian Lockwood

    LockwoodTech Software - value added SQL tools

    http://www.lockwoodtech.com

    Brian Lockwood
    President
    ApexSQL - SQL Developer Essentials

    http://www.apexsql.com/blog

    Stand up for an Independent SQL Community - be Informed

  • >The Information Technology arena has no room for extremists. Use the right paradigm/methodology/tool for the job.

    Finally somebody with the right idea.

    Just another tool for the old tool box.

    Like using pliers to remove a torx screw when I have a torx screw driver, and I just didn’t know what it was for. Darn if don’t always forget the hammer.

    Ignorance != bliss

Viewing 15 posts - 1 through 15 (of 144 total)

You must be logged in to reply to this topic. Login to reply