SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Representing XML data in SQL Server


Representing XML data in SQL Server

Author
Message
diogosouzac
diogosouzac
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 1
Comments posted to this topic are about the item Representing XML data in SQL Server
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)

Group: General Forum Members
Points: 844509 Visits: 46627
This is a nicely written article. Just as a suggestion for future articles, it would have been great if you identified where you got the Sales and SaleItems data from or included it as readily consumable data for people to experiment and confirm with. If you come up with a link to get the data or want to append some code to create and populate simple test tables, send a note to Steve and I'm sure he'll be glad to oblige.

Shifting gears a bit, from the article...
However, there are situations where it is necessary to visualize and export the results in a more structured format, that can be read by individuals with little technical knowledge

In direct contrast to what that states, I've found that XML is one of the worst formats for readability whether a person has technical knowledge or not. There is very little that's actually easier to understand than a typical row/column/tabular format possibly accompanied by a representative chart or two.

Then, there's this, also from the article...

In general, when exporting data to other applications, you should follow an XML formation scheme.


While I agree that presenting data in another format is sometimes necessary, I can't agree that exporting data with XML does anything that's actually good. It's bloated with repetitive tags (doubled for "Elements") and is so verbose that the bulk of the time to display the data is spent on decoding the XML just to extract the data. It's also and frequently 8 to 16 times (my personal observation) longer byte wise and so makes transmissions of data take longer, takes more memory to process and display, and much more disk space to store.

Just because millions of people are doing it, doesn't make it right. XML is a prime example of that.

p.s. Ah... there IS one good thing about using XML... because millions of people have kowtowed to using XML, it has driven the industry to make much faster equipment and hard disks much larger, faster, and cheaper. They had to because XML is so very inefficient in all ways. BigGrin

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
aaron.reese
aaron.reese
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10994 Visits: 1056
Personally I find XML (with an XML parser to properly do the indentation) much easier to read than JSON and it is a shame that JSON seems to be the default format for DTO activities such as API results. Glad than MS finally saw the light and included JSON in 2016/VNext

A couple of things to be aware of with XML, All of the nested SELECT statements effectively run independently so are a hidden RBAR so if you have a structure like

<Customers>
<Customer>
...
<Invoices>
<Invoice>
...
<Lines>
<Line>
...
<SerialNumbers>
<SerialNumber>
...


Then that is a lot of processing overhead.

Secondly, If you have a really large dataset which you have output as a single XML results field, you might have issues with saving the results. I once created a file for 25,000 rent letters with up to 5 levels of nested data and the resulting XML file was 1.4M elements and wouldn't fit into memory. I ended up having to save the letter fragments to a temp table and then BCP the table contents to a file.
lapras_dewgong
lapras_dewgong
SSC-Enthusiastic
SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)

Group: General Forum Members
Points: 119 Visits: 46
Very useful article to understand the XML conversion options available in SQL. Concise examples are the key.

I agree with Jeff in his views around XML. Bloated and not that easy to interpret for a human being, but accomplished it's objective to become a common standard. The equivalent of IRS here use it to represent our invoices and record keeping.
Rod
Rod
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26828 Visits: 2601
I appreciate this article and thank you for writing it. In my current job they store XML data in several databases. The curious thing, to me, is that the XML data is never stored as XML data. It is always stored as either a large NVARCHAR or maybe NVARCHAR(MAX). I do not know why. In some cases I imagine it was done before SQL Server supported the XML data format. But I know that's not true in other cases. In other cases they're creating new databases with XML data, but still storing it as NVARCHAR. Anyway, thank you for this article, as it helps me understand how XML probably should be handled in SQL Server.

Kindest Regards,Rod
Connect with me on LinkedIn.
Scott Coleman
Scott Coleman
SSC-Insane
SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)

Group: General Forum Members
Points: 23277 Visits: 2021
There is no question that XML causes the data files to bloat. On the other hand, it is much less ambiguous than swapping CSV files with knuckleheads who don't know how to deal with qualifying fields, embedded delimiters, columns with multi-line text values, and so on. Have you ever dealt with someone who thought it was your fault that your code couldn't handle \" instead of "" in a quote-qualified field?

One application I dealt with that benefited from XML data files was downloading reports (as XML) from a web service, where the report columns were subject to change without notice. An XML query could return all the field names from the first record, which were used to create a dynamic SQL query to read the columns in that report. The data columns came from a limited domain of possibilities, so a data dictionary table was used to provide data types for each column. The biggest argument against CSV in this case was text columns with multi-line values, that were guaranteed to include commas, quotes, tabs, line feeds, and other obnoxious characters.

In another case I had to replicate tables from an old Progress system to an SQL 2005 server. The ODBC connection between the servers had issues, querying a large table created so many locks in the Progress database that it would blow up. But Progress code dumping the table to a text file ran efficiently and without killing the server. (The reason for not using CSV escapes me at the moment.) The largest table was too big for either server to digest at once as an XML DOM object, but I could write a Progress program to export the data to a file as a text stream in XML format, and in SSIS I wrote a script task that could read the file token-by-token with the XmlStreamReader class. I'm not sure I want to write generic input code to read any possible XML file, but I controlled the source file format and thus could simplify the parsing. There's no rule that putting the entire XML file into memory at once is the only way to deal with it.



Jeff Rivera
Jeff Rivera
SSC Rookie
SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)

Group: General Forum Members
Points: 31 Visits: 441
Jeff Moden - Tuesday, August 7, 2018 6:37 AM
This is a nicely written article. Just as a suggestion for future articles, it would have been great if you identified where you got the Sales and SaleItems data from or included it as readily consumable data for people to experiment and confirm with. If you come up with a link to get the data or want to append some code to create and populate simple test tables, send a note to Steve and I'm sure he'll be glad to oblige.

Shifting gears a bit, from the article...
However, there are situations where it is necessary to visualize and export the results in a more structured format, that can be read by individuals with little technical knowledge

In direct contrast to what that states, I've found that XML is one of the worst formats for readability whether a person has technical knowledge or not. There is very little that's actually easier to understand than a typical row/column/tabular format possibly accompanied by a representative chart or two.

Then, there's this, also from the article...

In general, when exporting data to other applications, you should follow an XML formation scheme.


While I agree that presenting data in another format is sometimes necessary, I can't agree that exporting data with XML does anything that's actually good. It's bloated with repetitive tags (doubled for "Elements") and is so verbose that the bulk of the time to display the data is spent on decoding the XML just to extract the data. It's also and frequently 8 to 16 times (my personal observation) longer byte wise and so makes transmissions of data take longer, takes more memory to process and display, and much more disk space to store.

Just because millions of people are doing it, doesn't make it right. XML is a prime example of that.

p.s. Ah... there IS one good thing about using XML... because millions of people have kowtowed to using XML, it has driven the industry to make much faster equipment and hard disks much larger, faster, and cheaper. They had to because XML is so very inefficient in all ways. BigGrin


I feel sheepish disagreeing with a SQL legend but here goes: Regarding the point "XML is one of the worst formats for readability whether a person has technical knowledge or not" I think you need to keep context in mind. XML is not a good replacement for tables - which I agree are superior for displaying data. Rather XML is the superior and more readable option when outputting data into files. When you have to open a file for examination the XML file format is easier to read than the CSV equivalent. That is the context that XML is usually used in, containerizing information typically for consumption.

For what it's worth I agree with your bloated point, this may partially explain JSON's popularity.

Regarding the article I say "well done." I certainly learned something.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)

Group: General Forum Members
Points: 844509 Visits: 46627
Jeff Rivera - Tuesday, August 7, 2018 6:10 PM


I feel sheepish disagreeing with a SQL legend but here goes: Regarding the point "XML is one of the worst formats for readability whether a person has technical knowledge or not" I think you need to keep context in mind. XML is not a good replacement for tables - which I agree are superior for displaying data. Rather XML is the superior and more readable option when outputting data into files. When you have to open a file for examination the XML file format is easier to read than the CSV equivalent. That is the context that XML is usually used in, containerizing information typically for consumption.

For what it's worth I agree with your bloated point, this may partially explain JSON's popularity.

Regarding the article I say "well done." I certainly learned something.


Thank you for the kind words but I'm just like everyone else... not a legend. I just try to be helpful. Or to be more true than that, I aim to please... I sometimes miss but I'm always aiming. BigGrin

The following isn't an argument... It's just my humble opinion based on my own humble experiences.

I believe that what you're referring to as ease of readability in XML files is when you have to look at data halfway or more through a 10,000 "record" files (for example) and the file has a lot of columns. Trying to manually discern which column is which in a CSV or even a TSV file is admittedly a bugger and I do agree that properly rendered/formatted XML (which bloats it even more in a lot of cases unless you have a tool that automatically does the proper indentation, etc) can make life easier in such cases.

However, if you're in the business of working with such files on a regular basis and they are CSV or TSV or however delimited, then you should also have the right tools to examine the files. That's IF you need to examine the files. How many times does someone need to actually examine a file at the "element" level? In a previous position, I was responsible for the successful download, import, validation, and final merge of hundreds of files per day. Each file consisted of about 30,000 rows and was between 50 and 800 columns wide and varied from hour to hour (DoubleClick.net files) even for the same "client" source. Imagine the sheer volume of data that was. The files were (fortunately) all TSV files. I think I may have looked at a total of 5 of them over an 18 month period and I certainly didn't use NotePad, TextPad, or NotePad++ or any similar tool to do so.

I developed a system that would read the "two row" headers in each file to figure out what they were (each file had the same first 9 columns as a reference) and I was doing the full monty (download, import, validate, merge) at a rate of 8 files every 2 minutes. Remember that the number and names of columns would change incessantly.

Now, imagine the size of those files if they were XML. Then also try to imagine automatically flattening those files so that they could be unpivoted based on pairs and quads of columns and aligned with the first 9 columns using XML and also imagine how long that may take.

I know, I know... I'm one of those technical wire-heads that don't actually need to look directly at data in the files themselves and, when I do, I have the tools to do so. I'm not one of those non-technical people that might want or need to look at the data in the file directly.

So why would I be required to use XML just because someone else doesn't know what they heck they're doing? Why would anyone send such volumes of tag bloated, slow transmit (8 to 16 times slower in this case), slow to parse data just on the off chance that someone non-technical might want to look at the data directly in the file?

Heh... and I've seen non-technical people look for something in XML files... the term "Deer in Headlights" figures prominently when I watch them try to figure it out. Most of the ones that I've worked with can't even figure out where one "record" ends and another starts.

Do I think XML is helpful when looking at execution plans? In some cases, sure, but it is hierarchical data to begin with. Not data that belongs in one or more two dimensional tables.

So, to summarize... I'm not totally against XML. It DOES have it's uses but the word "uses" should never be meant to mean "everywhere". And, IMHO, it should never be used to transmit data for import into databases, I don't care who thinks they need to read the files with the ol' Mark I Mod 1 Eyeball. ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Eirikur Eiriksson
Eirikur Eiriksson
SSC Guru
SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)

Group: General Forum Members
Points: 153714 Visits: 22685
Thank you for this article Diogo, nice write-up.
Cool

Quick thoughts, I think it would benefit from slightly more technical details on the SQL Server's implementation of XML, such as the schema collections and the difference between typed and untyped XML. On the storage side, the method used to store XML in SQL server is a binary representation of the XML, not an identical copy. This safes roughly around 20% compared to storing the XML in text files.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)SSC Guru (844K reputation)

Group: General Forum Members
Points: 844509 Visits: 46627
Eirikur Eiriksson - Tuesday, August 7, 2018 11:49 PM
Thank you for this article Diogo, nice write-up.
Cool

Quick thoughts, I think it would benefit from slightly more technical details on the SQL Server's implementation of XML, such as the schema collections and the difference between typed and untyped XML. On the storage side, the method used to store XML in SQL server is a binary representation of the XML, not an identical copy. This safes roughly around 20% compared to storing the XML in text files.


Uh huh... what about during data transmission? And, I could certainly be incorrect, but considering that the XML text files are 8 to 16 times larger than delimited text to begin with, I'm thinking that's not much of a savings.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Eirikur Eiriksson
Eirikur Eiriksson
SSC Guru
SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)SSC Guru (153K reputation)

Group: General Forum Members
Points: 153714 Visits: 22685
Jeff Moden - Wednesday, August 8, 2018 12:12 AM
Eirikur Eiriksson - Tuesday, August 7, 2018 11:49 PM
Thank you for this article Diogo, nice write-up.
Cool

Quick thoughts, I think it would benefit from slightly more technical details on the SQL Server's implementation of XML, such as the schema collections and the difference between typed and untyped XML. On the storage side, the method used to store XML in SQL server is a binary representation of the XML, not an identical copy. This safes roughly around 20% compared to storing the XML in text files.


Uh huh... what about during data transmission? And, I could certainly be incorrect, but considering that the XML text files are 8 to 16 times larger than delimited text to begin with, I'm thinking that's not much of a savings.

Of course it does depend on the structure of the XML, if it's element or attribute bound etc., but because of the binary storage, there is little difference between short and long element and attribute names.
Cool

When it comes to the storage size comparison, lets include the final target, the SQL Server databaseWink
Here is a recent example (SQL Server 2017 RTM):

XML files on disk 42.7 MB
XML content in XML column 36.8 MB
The content parsed into a normalized and optimized schema 41.3 MB

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search