RE: Representing XML data in SQL Server

SSC Guru

Points: 1003863

August 7, 2018 at 8:51 pm

#2000729

Jeff Rivera - Tuesday, August 7, 2018 6:10 PM
I feel sheepish disagreeing with a SQL legend but here goes: Regarding the point "XML is one of the worst formats for readability whether a person has technical knowledge or not" I think you need to keep context in mind. XML is not a good replacement for tables - which I agree are superior for displaying data. Rather XML is the superior and more readable option when outputting data into files. When you have to open a file for examination the XML file format is easier to read than the CSV equivalent. That is the context that XML is usually used in, containerizing information typically for consumption.
For what it's worth I agree with your bloated point, this may partially explain JSON's popularity.
Regarding the article I say "well done." I certainly learned something.

Thank you for the kind words but I'm just like everyone else... not a legend. I just try to be helpful. Or to be more true than that, I aim to please... I sometimes miss but I'm always aiming. 😀

The following isn't an argument... It's just my humble opinion based on my own humble experiences.

I believe that what you're referring to as ease of readability in XML files is when you have to look at data halfway or more through a 10,000 "record" files (for example) and the file has a lot of columns. Trying to manually discern which column is which in a CSV or even a TSV file is admittedly a bugger and I do agree that properly rendered/formatted XML (which bloats it even more in a lot of cases unless you have a tool that automatically does the proper indentation, etc) can make life easier in such cases.

However, if you're in the business of working with such files on a regular basis and they are CSV or TSV or however delimited, then you should also have the right tools to examine the files. That's IF you need to examine the files. How many times does someone need to actually examine a file at the "element" level? In a previous position, I was responsible for the successful download, import, validation, and final merge of hundreds of files per day. Each file consisted of about 30,000 rows and was between 50 and 800 columns wide and varied from hour to hour (DoubleClick.net files) even for the same "client" source. Imagine the sheer volume of data that was. The files were (fortunately) all TSV files. I think I may have looked at a total of 5 of them over an 18 month period and I certainly didn't use NotePad, TextPad, or NotePad++ or any similar tool to do so.

I developed a system that would read the "two row" headers in each file to figure out what they were (each file had the same first 9 columns as a reference) and I was doing the full monty (download, import, validate, merge) at a rate of 8 files every 2 minutes. Remember that the number and names of columns would change incessantly.

Now, imagine the size of those files if they were XML. Then also try to imagine automatically flattening those files so that they could be unpivoted based on pairs and quads of columns and aligned with the first 9 columns using XML and also imagine how long that may take.

I know, I know... I'm one of those technical wire-heads that don't actually need to look directly at data in the files themselves and, when I do, I have the tools to do so. I'm not one of those non-technical people that might want or need to look at the data in the file directly.

So why would I be required to use XML just because someone else doesn't know what they heck they're doing? Why would anyone send such volumes of tag bloated, slow transmit (8 to 16 times slower in this case), slow to parse data just on the off chance that someone non-technical might want to look at the data directly in the file?

Heh... and I've seen non-technical people look for something in XML files... the term "Deer in Headlights" figures prominently when I watch them try to figure it out. Most of the ones that I've worked with can't even figure out where one "record" ends and another starts.

Do I think XML is helpful when looking at execution plans? In some cases, sure, but it is hierarchical data to begin with. Not data that belongs in one or more two dimensional tables.

So, to summarize... I'm not totally against XML. It DOES have it's uses but the word "uses" should never be meant to mean "everywhere". And, IMHO, it should never be used to transmit data for import into databases, I don't care who thinks they need to read the files with the ol' Mark I Mod 1 Eyeball. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)