Putting JSON on an equal footing to XML

  • Comments posted to this topic are about the item Putting JSON on an equal footing to XML

    Best wishes,
    Phil Factor

  • Couldn't agreed more. Been waiting for native built in JSON parser into MSSQL for a while now. With subindexing of the members and the ability to apply javascript predicates to documents. Doing that, Microsoft would position them selves into a document oriented db world with a standard product. As I said. Been wanting this as I run an OSS project http://sisodb.com that would look really different if that suport was built in. Then SisoDb would just be an addon to native ADO.Net

    //Daniel

  • I can no longer find a cite, but Fabian Pascal once made the obvious point: xml is not in any meaningful way "self describing", for if it were both the sender and receiver of data streams would need no explicit coding. Of course, that's bunk. The structure of the data must be known by both, and since that knowledge exists, then the smallest overhead in the data stream wins. By that strict criterion, csv (or fixed column) wins. If data need be somewhat interpretable, then JSON is the least confusing. JSON also mediates silly hierarchies, which is a good thing.

  • I agree 100%. Look at fastjson. It's better than anything Microsoft has for object serialization. I cringe every time someone mentions XML and SQL Server in the same breath. Microsoft needs to get there act together or SQL server is not going to be a very good choice for service oriented solutions. Supporting Jason would make ado.net a solid data access choice. Object relational mapping would be much better sign native Json support.

    Customers wanted XML support so that is what they got. The rest of the world is using json efficiently and leaving sqlserver in the dust. Phil I think we both know json support will be in SQL 2016 which is slated for release 4 years from now. Maybe red gate can sell me something in the meantime 😉 .

  • The book you mentioned cites "Create SOAP endpoints for use in web services applications" like that is a good idea. XML based soap endpoints cause so much pain for enterprise developers. Its the high blood pressure of enterprise development. Google the topic and prepare yourself for the endless tales of sorrow. I really couldn't let this go without pointing out that soap is bad.

  • emmchild (2/19/2012)


    Customers wanted XML support so that is what they got. The rest of the world is using json efficiently and leaving sqlserver in the dust. Phil I think we both know json support will be in SQL 2016 which is slated for release 4 years from now. Maybe red gate can sell me something in the meantime 😉 .

    MS wasn't the only DB vendor to get on the xml train when more useful functionality could be added. IBM did about the same thing with DB2/9.

  • RobertYoung (2/19/2012)


    I can no longer find a cite, but Fabian Pascal once made the obvious point: xml is not in any meaningful way "self describing", for if it were both the sender and receiver of data streams would need no explicit coding. Of course, that's bunk. The structure of the data must be known by both, and since that knowledge exists, then the smallest overhead in the data stream wins. By that strict criterion, cvs (or fixed column) wins. If data need be somewhat interpretable, then JSON is the least confusing. JSON also mediates silly hierarchies, which is a good thing.

    Heh... Robert Young for President! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • MS wasn't the only DB vendor to get on the xml train when more useful functionality could be added. IBM did about the same thing with DB2/9.

    Vendors sometimes jump on "standards" and miss the point of supporting better tools,techiques and utilities. As a developer I wonder who makes these decisions that can lead to local optima. Currently I see interest in the SQL Server source control that support SVN and doesn't take in consideration of more advance tools like Mercurial or git.

  • Silly hierarchies?

    I know a lot, if not most people creating XML are quite bad at the modeling and encoding part of their data. For example, nearly all XML documents I get shipped to me use child elements where they are absolutely not needed and even unwanted. People don’t seem to know what cardinality implies or what it means for modeling when data cannot be subdivided any further. As such there seems to be a mythical belief out there that everything must be modeled as a child element, even where an attribute would be so much more appropriate.

    Attributes are faster, they can be uniquely indexed and are local to the element without causing xpath navigation overhead when traversing a document. Also, in desterilized form they are a LOT faster to work with because of these attributes. Not to mention the benefits when partially parsing a document like with sax.

    Then there is encoding, I get swamped at work by parties that deliver XML full of encoding errors, cause by people building XML without a DOM or any knowledge of the really simple structure of XML. People do not even seem to understand the difference between the elements and the data that is contained within them. I am not kidding you!

    None of this however is in any way a structural problem of XML and the related co-languages (xpath, xslt, xlm schema, etc.). The problem lies with people that should not even be driving a slow car, trying to be behind the wheels of a formula 1 car!

    JSON is no different for those people really, they just can skip the whole modeling part and send garbage in just another format and that does not solve any of the mentioned problems. It is worse readable by humans (no clear formatting for example), and it manages to be worse than that for machines to process (is there a simple sax parser equivalent for large streams?). The only thing it really shines at is serialization of relatively small chunks of barely structured data. It absolutely does not deserves any place in a relational database as a datatype!

    Once a ‘structure’ is in JSON format, it pretty much is useless until you unpack it to memory and navigate it procedurally.

    Don’t even get me started about type safety …

    The only reason JSON gets used so much is because it originated from a more and more used platform called the browser. Something which in itself is causing more than a few issues in how applications are developed. You would have thought it would been figured out by now, but the staggering explosion of new ‘technologies’ around it to mitigate/contain the damage that is done and is part structural, just proves my point.

    But obviously http://www.json.org instead writes:

    JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

    The first being an outright lie IMO, the second one simply neglects to mention that parsing comes in many forms. Serialisation/deserialisation in my view is just a minimal form of parsing akin to CSV/TSV and fixed position flat files. And it is in my opinion easier to generate and parse XML then it is to do it for JSON, especially the escaping for xml is less cumbersome. On top of that XML knows things such as CDATA, where whole blocks of text can be in unencoded form instead of layering encoding over encoding in strings that are part of an array or structure such as the case in JSON.

    From nearly every point of view, Microsoft made the right choice and I hope they stick to it.

    Unstructured and/or hard to process data like JSON should just be stored as blobs and we have the tools for that if we choose to do so. The next version of SQL Server has much more flexible file system integration and in the meantime anything can be stored in varchar(max) as well.

    For the diehards, let them write their own .NET datatype and extend SQL Server if they think it really solves anything! In my eyes, there are far more important issues that the Microsoft SQL team need addressing!

  • @peter-757102

    I agree with you that XML is a much more mature product, and JSON has many glaring holes in it (DateTime?). However, I hope that I didn't imply that I thought that JSON was intrinsically superior. Few of us as database developers actually get the choice about the markup that gets thrown at us. I'd much rather get the attempt at a serialized object from an application than submit to the horror of application developers attempting the translation into a relational form themselves. If the application's dialect is JSON, I'd just like to be able to cope with it, and I don't currently have all the tools. Sure, I can tell the application developers to 'persist' their stuff in XML instead of JSON, but they're not going to obey me! It's hardly surprising that they insist, since JSON is a much easier way to serialize a JavaScript object than XML you just use JSON2-js (JSON in JavaScript). To convert an object to JSON, they can use the JSON.stringify function and create a javascript object from a JSON string is as simple as going [font="Courier New"]var my_object = JSON.parse(my_json_string);[/font] Thankfully, there isn't so much margin of error here for the rookie programmer!

    Best wishes,
    Phil Factor

  • <Rant>

    I know I'll probably be a voice of 1 but I have a serious hatred for most of markup languages. HTML is a fun little thing kind of like "Reveal codes" was fun in Word Perfect and it makes for some simple formatting. XML seems to be used primarily for data transfer and I hate it for multiple reasons. Even XML's so called "attributes" easily more than double or triple the size of the file (I've seen files with ratios in the hundreds). For the sake of argument, I'll just say that it doubles the size of most tabular data. That doubles the size for storage on disk, usually more than doubles the size in memory because of the cruddy "edge" tables that get built behind the scenes to parse XML, doubles the transmission time because there are at least double the number of bytes just from the tags never mind always being character based instead of 1, 2, or 4 byte integer based for most numbers, doubles the size of the cooling systems I need because everything has to work at least twice as hard to parse the data, etc, etc, etc. Heh... and who in the world actually thinks it's a good thing to depend on nested non-unique tags to build a hierarchy with? And then people clog up my databases with this crap because the "need to use the XML all at once" from the front-end. Yeah... the real reason they need it that way is because a lot of them don't know how to parse it and normalize it before dropping it into a database. Don't even get me started on EDI! 😛

    Phil wrote an article a while back where he also (IMHO) appeared to be a bit disgusted with the byte-bloat caused by markup languages for the transmission of data. He suggested that someone come up with a better way and I was going to jump at the chance except for one thing... one of the best ways to do all that was invented scores of years ago. Lookup ASCII characters 28 through 31 and see just how simple and fast data transmission used to be. If we reverted to such a thing for the transmission of "unformatted" (i.e. not for display purposes) with a little common sense of transmitting byte values for numbers instead of characters, most of us would never have a "performance issue" with our networks, storage systems, etc. Instead of having bottlenecks, our data would be like a BB shot from a rail-cannon through the Holland tunnel instead of like trying to drive a pig truck through a Coke bottle.

    We just don't need the overhead of languages like XML to transmit even some of the most complicated data. I won't get into what a travesty I think it is to transmit tagged hierarchical data to an RDBMS even wheen it seems like a good idea. And, no, we don't need formatting tags (even when CSS is available) on every bloody element of data even for the front end. Assign a "default" and only include formatting for exceptions. I know that's the way that a lot of good people try to do it but, apparently, none of the people sending me XML data are "good" people. 😀

    And before you folks blocking the back door of this club go postal on me about human readability, isn't THAT what the presentation layer is for??? :Whistling:

    </Rant>

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Actually, while XML in readable form is pretty bloated due to it being so verbose (hard to argue with you there). For storage however it can be very compact when done right. This is due to its high compressability, and I do not mean tools like zip and rar, but more specialised compressors. Where a normal compressor needs to build up a model of the data, one with xml/schema knowledge gets a model handed to him, improving the compression.

    As for its use in databases as a type, here we don't use the xml type at all and strore it all as text instead. The intersting or frequently accessed bits we work with are extracted by the application and stored in separate fields next to the plain text xml. Storing XML along with the attributes we use does have an advantage to us in three ways:

    1. We (and the customer) can quickly see what the source was of the data we work with and check this out when an error is suspected.

    2. We can extend the working set of attributes and populate the new ones by extracting them from the xml (with a sql statement if need be).

    3. We have the context of the data we use and thus can do some analysis on historical data. Good to have as an option for marketing purposes.

    JSON would I think not be suited for any of these points.

    As for readability, i love to drag an xml fragment to my browser (IE) and see a nicely readable source, which makes it easy to spot problems, be it programming errors on our side or simply wrong data delivery.

    And I agree quite a bit with you Jeff, on the sorry state of current markup languages, especially HTML. In many ways, it feels like old basic with line numbers and no labels or functions all over again. The source of many modern interactive pages is often uncomprehensible by anyone that did not work on the page just a minute ago. That is why developers (me too) are bending themselfs in all kinds of application modeling excersises as we need to get this "mess" under control.

  • I'm dragging myself into figuring out Json, because of course we need to store it... and query it.

    What I don't get is why to me it looks so much like XML, even the types of queries I've seen tossed my way as possibilities for how the devs would want to query seem to be Xpath type queries.

    Why hasn't anybody written something to not just convert JSON to XML (for storage into SQL, which I've found a lot of), but to translate JSON queries to XML path style queries?

    I figure it would have to be CLR, but when it's that or running Mongo side by side (and having to maintain data in both the SQL side and the Mongo side) it seems that it should exist.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • I have developed a JSON type for SQL Server available for purchase at http://www.json4sql.com. It is a port of the binary JSON type just released for PostgreSQL.

    It also offers:

    * Table value functions for returning elements of JSON objects / columns as a result set

    * A fluent API for creating, modifying, and accessing object properties

    * JSONPath expressions to set, remove, and get data within JSON objects (similar to XPath expressions for XML)

    * Aggregations for converting result sets into objects or arrays

    * Binary storage for fast data access without reparsing

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply