• Silly hierarchies?

    I know a lot, if not most people creating XML are quite bad at the modeling and encoding part of their data. For example, nearly all XML documents I get shipped to me use child elements where they are absolutely not needed and even unwanted. People don’t seem to know what cardinality implies or what it means for modeling when data cannot be subdivided any further. As such there seems to be a mythical belief out there that everything must be modeled as a child element, even where an attribute would be so much more appropriate.

    Attributes are faster, they can be uniquely indexed and are local to the element without causing xpath navigation overhead when traversing a document. Also, in desterilized form they are a LOT faster to work with because of these attributes. Not to mention the benefits when partially parsing a document like with sax.

    Then there is encoding, I get swamped at work by parties that deliver XML full of encoding errors, cause by people building XML without a DOM or any knowledge of the really simple structure of XML. People do not even seem to understand the difference between the elements and the data that is contained within them. I am not kidding you!

    None of this however is in any way a structural problem of XML and the related co-languages (xpath, xslt, xlm schema, etc.). The problem lies with people that should not even be driving a slow car, trying to be behind the wheels of a formula 1 car!

    JSON is no different for those people really, they just can skip the whole modeling part and send garbage in just another format and that does not solve any of the mentioned problems. It is worse readable by humans (no clear formatting for example), and it manages to be worse than that for machines to process (is there a simple sax parser equivalent for large streams?). The only thing it really shines at is serialization of relatively small chunks of barely structured data. It absolutely does not deserves any place in a relational database as a datatype!

    Once a ‘structure’ is in JSON format, it pretty much is useless until you unpack it to memory and navigate it procedurally.

    Don’t even get me started about type safety …

    The only reason JSON gets used so much is because it originated from a more and more used platform called the browser. Something which in itself is causing more than a few issues in how applications are developed. You would have thought it would been figured out by now, but the staggering explosion of new ‘technologies’ around it to mitigate/contain the damage that is done and is part structural, just proves my point.

    But obviously http://www.json.org instead writes:

    JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

    The first being an outright lie IMO, the second one simply neglects to mention that parsing comes in many forms. Serialisation/deserialisation in my view is just a minimal form of parsing akin to CSV/TSV and fixed position flat files. And it is in my opinion easier to generate and parse XML then it is to do it for JSON, especially the escaping for xml is less cumbersome. On top of that XML knows things such as CDATA, where whole blocks of text can be in unencoded form instead of layering encoding over encoding in strings that are part of an array or structure such as the case in JSON.

    From nearly every point of view, Microsoft made the right choice and I hope they stick to it.

    Unstructured and/or hard to process data like JSON should just be stored as blobs and we have the tools for that if we choose to do so. The next version of SQL Server has much more flexible file system integration and in the meantime anything can be stored in varchar(max) as well.

    For the diehards, let them write their own .NET datatype and extend SQL Server if they think it really solves anything! In my eyes, there are far more important issues that the Microsoft SQL team need addressing!