For SQL Server, XML Is One Answer

  • quote:


    Now, one argument or beef with the vendors is, developers want to store the XML within the DBMS. Why would ANYONE want to do this?


    Seems wrong headed, doesn't it?

    I think I can guess the reason that storing XML inside SQL tables is distasteful to many database designers: XML clearly contains data fields, and the developer who stores data as XML is not allowing SQL Server to manage that data.

    So, think about it... what other times do you store data in the database where the data has valuable content, but the database does not manage it? How about scanned images of documents... or any other image where the image contains content that the database cannot "see" (signatures, pictures of catalog items, encrypted password hashes, PKI certificates, you get the drift).

    The fact that SQL cannot manage this data, we become resigned to the idea of storing it in SQL as a BLOB.

    But XML is readable, and that makes it abhorrent to store it in the DB as a BLOB. In other words, if XML wasn't so readable, most folks wouldn't be so miffed about storing it in a database!

    But there's two advantages of SQL Server that act as a disadvantage in some specific situations:

    1) SQL has a fixed set of heirarchical levels. If your data does not, and you only need to store it (and not manipulate it), then SQL Server is overkill.

    In the past, we'd put the data in some alien format, store it in a file, and put the file name in SQL. Now, we store the content in SQL Server as well. Same difference.

    2) SQL makes it easy not only to find a field, but to change it. What if we don't want that? What if a set of records that represents a single very complicated document is only valid if it never changes?

    Storing it in a tool where the tool provides easy access to each field adds the burden of proving that the fields haven't changed. This is ESPECIALLY true if the data is normalized, since a change in an independent table will be reflected in all referring transactions.

    In this case, it is sometimes helpful to put the XML document in a field as an audit item.

    quote:


    That is what the relational model is all about: identifying and constraining the elements and relations of the sets of data. XML can not even begin to come close to addressing these storage, constraint, quality, and retrieval requirements.


    If you accept the constraints of relational models, that being a predetermined and static set of relationships, RDBMS is appropriate. However, over 90% of all business information is not stored in relational database systems. While much of this could be managed relationally, some if it cannot. For these folks, a pure RDBMS implementation is either insufficient or inefficient.

    One exciting development: Using Microsoft SQLXML, you can load an XML document into SQL, and it will break out the data from the XML structure and store it in relational tables and fields... No more excuses for those people who intend to use the data in a relational manner.

    quote:


    The very idea of an XML data type is an oxymoron. Data types are one piece of the DOMAIN construct who's sole purpose is to constrain and add quality to the data storage/management. How is XML constrained? A schema? But that's a DBMS.


    Sometimes the developer wants no constraints on this data item. Sometimes, he or she wants to mix Relational data with structured heirarchical data. There is a place for this data type.

    quote:


    How thick can some of you people be?

    Time to go to class, grow up, and get a real education. These do-it-yourself in 21-day books you've bought are doing us all a disservice.


    Not directed only at you, but... I'm dismayed by how freely fellow developers are attacking each other in these forums.

    Can we keep the conversation civil? Neither of us have any idea how qualified, or unqualified, the other members of this board may be.

    Ideas are open for criticism, and rightfully so. However, we will all learn much more from each other if we refrain from attacking the people who post them.

  • I concur that we need to remain civil. Overall both this article and the original have evoked a good deal of discussion - good discussion! I'd hope that each might learn something from the opposing view.

    Maybe Im old, or not old enough, but I rarely see black and white anymore, lots of gray in the world. Very few things are absolute.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Oh...the confounding of the masses!

    I believe most of the replies given to my criticisms were in agreement, mostly, with the notion of NOT storing multi-valued attributes within a single field specification when that tool will only support single-valued elements.

    I agree that if you stored the XML content with the expectation that it was static, for instance the XML schema definition itself as opposed to the XML content, and were prepared to treat it as such, then, a BLOB/CLOB data type would seem appropriate.

    The beef I have, however, is manifold. History has shown us that hierarchical structures, although present, are rigid, inflexible, and administratively burdensome. The relational model is much more highly flexible, even to the point of creating models of data models not just the content itself, thus, producing the dynamical flexibility that many developers tout as one of the strengths of XML.

    The point is that MS SQL Server, Oracle, DB2, Sybase, et. al. are SQL and attempt to be Relational DBMS’s. To subvert the relational model by providing for multi-valued content without the proper database management functions to constrain, quantify, and validate that information breaks the sole purpose for providing the DBMS to begin with.

    Let us not forget; there are hierarchical and multi-valued DBMS platforms available out there. Does anybody remember dBASE, FoxPro, Paradox, Progress, Informix, et. al.? Those are antiquated technologies. Although many data administrators and developers may still support such monstrosities, can anyone argue the superior benefits of these systems over the relational systems?

    Predicate Logic and Set Theory, the foundation for Relational Systems, says little to the elements of the sets other than the constraining factors of their respective domains, but merely the constructs for the relations between sets and the operations that may be performed on them. The model can be extended far beyond the content of data in raw form to include data structure as content and higher abstract models as data content too. This is much more highly flexible than the rigid structure that XML imposes upon us.

    As to my incivility, with regard to peers, my apologies, however, I am inundated with so-called data and database "developers" who do not have the first clue as to what data management is or how to employ the tools available to optimize such management. Moreover, it is these "nitwits" that are usually the first to jump in line with the other lemmings following the piper vendors on every new fad purposed without thoughtful examination. Moreover, these are the ones who misuse the tools most often because they have no basis to evaluate the correctness of their solutions, no historical knowledge or experience with the antiquated concepts in order to defend their creations.

    As you yourself expounded, the developer needs this, the developer requires that. True, the database developer may or may not need such solutions. However, how truely qualified are many of the application developers who also see themselves as database development "experts"?

    In my opinion, an application developer and database developer for over two decades, one who remembers the days of ISAM, VSAM, dBASE, Foxpro, etc., and now, a fully qualified database administrator as well, the evidence of the use and misuse of such technologies as XML, I stand by my comments: put down the "Teach Yourself in 21 Days" books and do a little "REAL" research as to the design and purpose of database systems.

    If you want hierarchical storage, bleed at your own risk, but keep that garbage out of the Enterprise's Relational system. As database system administrators and developers, it is your responsibility to fight against such insanity and to educate the lower beings on the proper usage of such tools. A wise person would recognize their inadequacies and defer to the wisdom of the more-learned colleagues. Quit following the market-speak, listen to your elders, question the wisdom of your actions, you arrogant pups. Some of us have been around the block a few more times than you. You could learn a thing or two that might really do someone, in addition to yourself, some real good.

  • >>Quit following the market-speak, listen to your elders,

    >>question the wisdom of your actions, you arrogant pups.

    Who are you talking to here?

    How about addressing some of the points made in Nick's article, for or against?

    In my opinion, Nick has made some excellent points and made them well. He's even taken the time to write an article which attempts to put the discussion on a more constructive footing, and then gone on to respond to comments constructively -- even when these comments appear to have been written without any reference to what has actually been said. To me that's a reason to listen to and respect someone, whether you agree or not. The fact that you are a "qualified database administrator" amounts to little -- I'm sure the people who signed off Enron's books were qualified accountants.

    Trolling in these forums is something I've never seen, but in this case I'm reserving judgment 🙂

  • quote:


    I concur that we need to remain civil. Overall both this article and the original have evoked a good deal of discussion - good discussion! I'd hope that each might learn something from the opposing view.

    Maybe Im old, or not old enough, but I rarely see black and white anymore, lots of gray in the world. Very few things are absolute.


    Andy and others...

    I agree that the discussion has been desperately needed (hence my original article.) I have also been tremendously gratified by the response. I frankly didn't expect the article to get published at all. That it was is a testament to the editorial teams' fairness and open-mindedness. When it was published I expected a largely negative reaction given that I as "attacking" one of the hottest trends in the business. That the responses were, for the most part, fairly well reasoned is a testament to the quality of the SQL Server Central audience. Many thanks to the SQL Server Central team and to the community in general.

    That being said, Chris Date introduced the concept of what he called the "incoherence principle" which states "That which is incoherent is difficult to treat coherently." Some replies could be poster children for the incoherence principle, and in my replies the diffuculty of dealing with them in a rational manner may have shown through in the form of frustration. For this I appologize, especially since I know that the worst way to convince people is to insult them.

    However, as a DBA I can't afford to allow poor practices to jepordize my data for the sake of getting along. In fact you could say that many of us DBAs have made careers out of frustrating developers For as long as I have been in the business I have had to tell developers "NO." XML is just another iteration of that cycle.

    If the developers could have their way, we would have no constraints on our data at all! I wish I had a dollar for every time I've heard one complain that I am making their job difficult because I won't (more properly the database won't) let them do something. In almost every case, they are absolutely correct and the thing they are wanting to do SHOULDN"T BE DONE!

    Now, don't take this as a slap at developers in general. "Some of my best friends are develpers!" To be certain, there are many who understand the importance of data management, but in general their focus is much different so it becomes my job to force the issue.

    Many of the arguments for XML are essentially: "But XML makes using these poor practices soooo easy, and the vendors all say it's O.K..." But the real problem is that many are so ignorant (not stupid) that they don't even realize that what they are wanting IS a "worst practice." And since it is somewhat disguised by industry hype, many who should know better are fooled too.

    Andy, you can't be as old as you intimate if you don't recognize the pitfalls of hierarchical data management.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • In response to planet's reply:

    In a discussion thread, a response may seem vague and ill-directed unless the thread is followed in its entirety. Moreover, nickmalik's article was written in direct counter-point to dcpeterson's. The thread there too must be read. In addition, certainly some degree of civility should be maintained; I concur. However, within the discussion threads in response to both articles, there seems to be more than a fair share of hostility directed towards the opposing view, with the viewpoint holders included, i.e., that XML is definitely bad for any want-to-be RDBMS, and potential bad for communication transfer.

    However, for my thin-skinned proponents of XML's virtues, you are correct that my profane remarks have directed attention away from the meat of my comments; thus, doing myself and our respective readers a serious disservice.

    So, allow me to readdress the article.

    First, nickmalik states, “To make it work, XML had to be stateless, scalable, and yet easy to describe and understand. Therein lies its advantage.” I say, therein lies XML’s disadvantage.

    "Stateless:

    A stateless server is one which treats each request as an independent transaction, unrelated to any previous request. This simplifies the server design because it does not need to allocate storage to deal with conversations in progress or worry about freeing it if a client dies in mid-transaction. A disadvantage is that it may be necessary to include more information in each request and this extra information will need to be interpreted by the server each time."

    The concern here, as others have stated, is that XML is bloated. It is also highly dependent upon the hierarchical structure imposed upon it. It may answer one question quite nicely. However, how often does a business unit ask only a single question? This “stateless” and fixed structure nature impose quite a high resource demand and is in direct conflict with the author’s second purported “advantage.”

    "Scalability:

    It is the ability of a computer application or product (hardware or software) to continue to function well when it (or its context) is changed in size or volume in order to meet a user need."

    I claim XML does neither: XML does NOT scale-well with the increase of data transmission nor does it scale-well when we wish to transform it to ask new questions (context). Given the necessity to reproduce the structure imposed upon the document, whenever there is sparse data, not only do we have the encumbrance of all of those tags, we must reproduce them at every iteration, even when the structure contains very little information. Moreover, given the rigidity of the hierarchy, serving multiple users all with varying questions, a particular document quickly becomes useless to all but a very few requestors.

    Perhaps as a transfer mechanism, XML could play some part—but I doubt it given its size in-scalability. However, as a means to structure, organize, and store related information, XML can neither protect, enhance, nor address the myriad unforeseen requests by the users. It fails badly for any attempt at data management. Therefore, its use as a storage mechanism is whole ill-thought.

    Shall I continue? Let’s.

    Our author recommends XML as a superior descriptor of related tables, far and above CSV or EDI. Hogwash. Whether a single CSV file with all of the information contained in each record, or as multiple files, one for each relation, CSV is much more efficient. Ah, but the proponents would say that it is not descriptive. But, neither is XML. You must still transmit that information between transmitter and receiver prior to understanding the intent of the information. How is XML superior in this respect?

    Our author states, “Parsers for XML are free, widely available, and FAST.” So are a myriad of other “parsers” for many other formats, CSV included. Now, this is not an advocation of CSV in particular, only that XML holds no special relevance or superiority in this regard. Moreover, we must contend with the resource over-utilization given XML bloat.

    Point 3: “Any system can be augmented to read and write XML.” Can we be any more general? Please; if agreed upon, any technology can be “augmented to read and write” that technology. The question is, “Why should we choose XML over other formats?”

    Point 4:

    “So many software vendors and application architects have seen the value of XML that an entire industry has developed around providing tools for reading, parsing, translating, routing, and managing XML documents. With so many tools available, a developer faced with the need to transfer data from one application to another would be foolish to consider spending time and effort to write components that produced or consumed data in any other format. It's just not productive to write code that someone has already written and tested, which both the developer and customer already have, and which is likely to be supported into the foreseeable future.”

    Let’s all get in line. Big brother is watching. So what that there are existing tools. There are many tools out there; we have a choice. It is our responsibility to examine and DEFEND those choices.

    Finally, I’ve made my arguments for not storing XML in the DBMS. Re-read them. Or, at least, refute them. Some of you have said the author is only expressing to database external recommendations for the use of XML. That XML “probably” is a bad idea for the DBMS. Oh? What did the author write?

    “OK, so XML has its place. But what about inside SQL Server? Why did Microsoft add XML capabilities to their flagship RDBMS system? It was the right thing to do.”

    Sounds like the DBMS has it too. Hmmm, no one advocating this position? Problem is, no one is DEFENDING this position. Can I get a witness?!

    Edited by - iapetus on 10/27/2003 12:22:32 PM

  • >>Moreover, nickmalik's article was written in direct counter-point to

    >>dcpeterson's. The thread there too must be read

    I read the original article -- I even posted a few times 🙂

    First I should establish that I am actually in agreement with what I understand to be the main thrust of the anti-XML argument. That is that XML is inferior to RDBMS for central, long-term data storage and retrieveal. Yep, with you there. I wrote a dissertation in 2000 in which I evaluated the soon to be ratified XML standard. In the opening paragraphs of my summary I state that XML is not in anyway a replacement for the relational database. In that paper I also find in your favour that XML is not scalable. For me, using the DOM is the only way to go when using XML and this means parsing the entire document and creating the whole tree in memory -- I think that if you feel the need to use the event driven linear parsing (SAX I think it's called) except for specific and occasional functionality, you've probably gone wrong in the design.

    However, I'm a bit alarmed by what I perceive as an attempt to rubbish the whole technology, simply because it is (apparantly) being set to uses for which it is clearly not fit. It's very useful and robust tool for developers in the myriad of situations where a low volume data needs to be packed up, shipped from place to place and queried. Messaging and configuration files come most readily to mind. It seems pretty obvious to me that a full featured RDBMS would be well over the top for such purposes.

    OK, stateless. Yes XML is stateless and that's appropriate for the kinds of purposes that most people put it to. These days, what with the interweb et al, it's not efficient, practical or desirable to make your entire RDBMS available to anyone who might have need of it. You have to send your data out in discrete self reliant packages. XML is good for that.

    I've said it before, but I'll repeat that for me the "bandwagonning" is one of XMLs strengths. Even if, unlike me, you think XML is badly devised, you could consider acknowledging the benefits of a cross-vendor, and simple standard for knocking up these ad-hoc data structures. Developers face enough problems with cross platform integration, anything that moves away from that is a blessed relief. Consider SMTP -- it looks a bit lacking in terms of authentication now, and few people would argue it's perfect for today's world. But a simple, *standard* protocol, fit for purpose, changed the way we communicate today because everyone adopted it. Ditto http/html. Whatever you think about sharky vendors trying to make a buck by adding XML to the end of their product name (and I probably share your views), please consider that the XML standard was developed and agreed upon by all the major vendors -- they agreed that there was a need for a common *non proprietary* data interchange format and that it would be good for all of them to cooperate. They found limitations with what was available and worked together on XML. Even Microsoft resisted the temptation to tinker with the crucial XSL-T specification by adding 'extended' features.

    The widely criticised bloat is a price worth paying, and is in any case vastly exaggerated (ever heard of compression?), and less important as long as you are using the format for relatively small documents.

    And, [desperately trying to get back on topic], because I see XML as a powerful format for developers, it makes sense that a database should support functionality to send and recieve (NOT store!!!!!!) stateless recordsets in an XML format. Converting an XML document to and from SQL statements is something that will be done in any case (I worked on software that did exactly this with SQL Server before any native XML support), so it makes sense for a database vendor to centralise the development work, provide a common interface and let everybody go home early. I agree that if the FOR XML syntax chokes the DB engine then it won't be used ... I think in the case of SQL Server, a lot more work can be done here, but I can't think of any reason why spitting out XML text streams should be particularly less efficient than generating ADO recordsets -- please le t me know if I've missed something. I've very little idea of what is planned for Yukon though -- if you know of specific examples where XML is to be prised into the nervous system of the next generation SQL Server at the expense of core functionality, let's hear them, I'll probably be right with you 🙂

    And finally, there's XSL. Whatever you think of functional programming, there's no doubting that SQL -> (XML+XSL) is a much more pleasant way of writing data-focussed web applications than SQL -> ADO+ASP. There's lots of people writing such applications, many are microsoft customers and they are, after all, a business!

  • In response to iapetus and dcpeterson, and all of the other thoughful responders...

    Thank you both for your well thought out responses. While I don't intellectually agree with every point you make, I commend both of you for taking the time to discuss them.

    Special thanks to iapetus for thoroughly addressing the statements I made in my article. I do feel that, in the article and subsequent discussion, I've done my best to answer his or her questions.

    I would be honored to work with either of you someday, if for no other good reason than to have someone around to tell me that I'm wrong 🙂

    Alas, I will not convince you of my position. Hopefully, this discussion was helpful to the readers of this site.

    Oh, and just for the record, I've been coding professionally for 23 years. I have a degree in computer science, a background in mathematics, and graduate-level education in database theory, set mathematics, human-computer interaction and two areas of artifical intelligence. I've coauthored two technology books, one of them on SQL Server (although it wasn't as successful as "SQL Server in 21 days," 😉

    You see... not all XML proponents are nitwits.

    With greatest respect,

    nickmalik

  • nickmalik:

    Thanks for all of the kind words. And, though, your response reads like a salutation, I would be very interested in any specific comments, arguments, thoughts you might have on the new features being presented in the upcoming Yukon release. SQL Mag just covered quite a few interesting "features" in the recently released November, 2003 issue. In there, right at number 3 as the most import "feature" being included with Yukon, is the XML data type.

    In addition to any general comments you might have, specifically, how the inclusion of this new data type assist the DBMS with its core purpose: the insurance of consistent, persistent, flexible, quality data?

    I too am glad to see that many of us are giving serious consideration and thought to the technological choices we make, not mearly going with the trend. My thanks to you and the time you have spent to ponder these questions.

  • XML is one answer, but not the ONE answer.  Be smart use it when you need it. 

    It is like most other MS quickie tools, great to get started with, and a pain to get out of when you grow out of it and need something more suited for your specific 'big' project.

    Wait until you hit the 4K limits around Windows and with SQL.  Then you will want to leave the data in an XML file and only store a unc or url to the xml for later use.

    In multi server systems with or without DTC, XML will eventually be to bloated to use when the transactions peak, (the same is true with MS  transactional replication).  The bloat hits the network as well as the server memory resources and is difficult to recover, even if you have a top of the line system.

     

    [font="Arial"]Clifton G. Collins III[/font]

  • Not just XML, but there is a whole world of interoperability opening up with Web Services, and XML (SOAP) is becoming quite popular. As a DB developer in a web dev group I've had first had experience with the power that comes with web services.  This is definitely THE "killer app" for XML.

    Like it or not, XML is here to stay, and if you don't use (for religious reason's or other) you're going to miss out on a lot.

    cl

    Signature is NULL

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply