Looking for my "why use xml" aha moment

Question

Looking for my "why use xml" aha moment

markel

SSC Veteran

Points: 267
More actions
June 6, 2022 at 4:21 pm

#4045139
I'm learning xml. I'm getting the hang of the nitty gritty, tags, elements, etc. I haven't come across the WHY except that it's "good for sharing across platforms." So I put something into xml. Then share it across platforms. Then what? If the data was gathered to be used by someone who doesn't know xml https://omegle.onl/, how does the person access it? And in what circumstances is xml better than putting end-user documentation, for example, into a PDF? I know I'm missing a piece of the big picture.
- This topic was modified 3 years, 6 months ago by markel.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 1

markel wrote:

I'm learning xml. I'm getting the hang of the nitty gritty, tags, elements, etc. I haven't come across the WHY except that it's "good for sharing across platforms." So I put something into xml. Then share it across platforms. Then what? If the data was gathered to be used by someone who doesn't know xml, how does the person access it? And in what circumstances is xml better than putting end-user documentation, for example, into a PDF? I know I'm missing a piece of the big picture.

I haven't found XML to be good for file transfers at all. Between the tag bloat and the resources necessary to shred it, I'm of the opinion that it's one of the worst ways to transmit data. And, to make matters worse, most of the data that I've found using XML are using it to transmit "single entity" types of data where a nice, little, super fast TSV would be much better.

I also find that, as with EDI, sending mixed entity data in a hierarchical fashion is nothing but trouble for most people.

People also have no clue as to what it does to their clustered indexes even when it's stored as a LOB. Like any oversized character based data, it produces "Trapped short rows" that causes serious reduction in performance because of the super low page densities. I also find that a great many people also do inserts of the primary data into a table and then update the XML columns causing massive logical and physical fragmentation of the Clustered Index.

I'd also rather use EAVs than XML if that type of functionality is required. You can't even get conditional DRI out of an XML column.

I feel pretty much the same about JSON.

Both are massive forms of denormalization on steroids. Because of the tag bloat and data bloat (because everything is character based *2), they shouldn't even be used for audit/history nor as supposedly clever collections of configuration data.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2242 More actions · Answer 2

One area where it seems XML is required is for the conversion/encoding of strings to varbinary base 64. Altho if there's newer way I'd be interested to hear about it

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 3

Steve Collins wrote:

One area where it seems XML is required is for the conversion/encoding of strings to varbinary base 64. Altho if there's newer way I'd be interested to hear about it

XML isn't required for base 64 (or 3/4 Byte). It's just an encoding.

Also, when was the last time you had to work with Base 64 in SQL Server?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2242 More actions · Answer 4

This concatenates (a bunch of guid's and an expiration date) together and encodes to binary base 64 and converts the result to nvarchar. The encoded token could be used in an HTTP header as part of an API request

declare
  @raw_tokenvarchar(max),
  @token                        nvarchar(max)

/* build token */ 
select @raw_token=concat(cast(@user_securitystamp as char(36)),                                             /* user */
                         cast(@application as char(36)),                                                    /* application */
                         @pshp_securitystamp,                                                               /* client (Partnership) */
                         cast(@check as char(36)),                                                          /* check */ 
                         (select convert(char(19), dateadd(dd, @expire_offset_days, getutcdate()), 127)));  /* expiration */

select @token=(select cast(@raw_token as varbinary(max)) for xml path(''), binary base64);

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 5

Have you actually had to do such a thing where you work?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2242 More actions · Answer 6

Something similar iiirc. As I recall HASHBYTES were applied too. It was to make offers which expired

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Grant Fritchey SSC Guru Points: 399352 More actions · Answer 7

The only reasons I would use XML are because I've got a metaphorical and/or technical gun held to my head. It's honestly crappy and the XPath queries within SQL Server are a pain. If you must, and again, metaphorical gun, JSON is better, for certain values of "better".

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Steve Collins SSCrazy Points: 2242 More actions · Answer 8

Yes it's just an encoding. How could one perform such an encoding without XML? That's the hijacked topic 🙂 As an aside: I associated the encoding with a conversion because the base64 result typically (in my myopia) gets added to some JSON document. Ha, JSON is the Great Savior. Without JSON we could really be stuck with XML for really real. Without the JSON features in SQL Server data access might require ORM or XML. Unless someone could suggest another alternative I'd be interested to hear about it. Why not just send and receive JSON to and from the database without converting it to any other object type? As of a few months ago it can be done generically in .NET without using System.Reflection. That might be a good way, no? Other approaches could seem comparatively wasteful

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 9

Jeff Moden wrote:

markel wrote:
I'm learning xml. I'm getting the hang of the nitty gritty, tags, elements, etc. I haven't come across the WHY except that it's "good for sharing across platforms." So I put something into xml. Then share it across platforms. Then what? If the data was gathered to be used by someone who doesn't know xml, how does the person access it? And in what circumstances is xml better than putting end-user documentation, for example, into a PDF? I know I'm missing a piece of the big picture.
I haven't found XML to be good for file transfers at all. Between the tag bloat and the resources necessary to shred it, I'm of the opinion that it's one of the worst ways to transmit data. And, to make matters worse, most of the data that I've found using XML are using it to transmit "single entity" types of data where a nice, little, super fast TSV would be much better.
I also find that, as with EDI, sending mixed entity data in a hierarchical fashion is nothing but trouble for most people.
People also have no clue as to what it does to their clustered indexes even when it's stored as a LOB. Like any oversized character based data, it produces "Trapped short rows" that causes serious reduction in performance because of the super low page densities. I also find that a great many people also do inserts of the primary data into a table and then update the XML columns causing massive logical and physical fragmentation of the Clustered Index.
I'd also rather use EAVs than XML if that type of functionality is required. You can't even get conditional DRI out of an XML column.
I feel pretty much the same about JSON.
Both are massive forms of denormalization on steroids. Because of the tag bloat and data bloat (because everything is character based *2), they shouldn't even be used for audit/history nor as supposedly clever collections of configuration data.

It really depends on what you're using XML/JSON for. XML was was created for web data transfer and JSON was an improvement on XML. And for that application they do have a number of advantages over delimited or fixed width data sets. They're more resilient to structure changes, both in terms of the order of data elements and potentially optional data elements or repeating elements. That's very useful when dealing with an interface that might be used by thousands of different people who don't all have the same input/output requirements. And having the relationships defined explicitly by the structure of the document allows for single document to be sent with a fairly complex structure without having to recreate any of the relationships at processing time like a relational dataset would require.

But they are definitely not ideal for bulk data transfer which is something people do far too often. They have some web API and think that it's fine to just stick thousands of requests into a file and send it.

Jeff Moden SSC Guru Points: 1004685 More actions · Answer 10

Steve Collins wrote:

Yes it's just an encoding. How could one perform such an encoding without XML? That's the hijacked topic 🙂 As an aside: I associated the encoding with a conversion because the base64 result typically (in my myopia) gets added to some JSON document. Ha, JSON is the Great Savior. Without JSON we could really be stuck with XML for really real. Without the JSON features in SQL Server data access might require ORM or XML. Unless someone could suggest another alternative I'd be interested to hear about it. Why not just send and receive JSON to and from the database without converting it to any other object type? As of a few months ago it can be done generically in .NET without using System.Reflection. That might be a good way, no? Other approaches could seem comparatively wasteful

I find it odd that they would build such encoding into XML without building in some "old fashioned" coding to handle things like integers and dates without the bloat of having them appear as ASCII characters.

But, whatever. It and JSON are present and they're not going away any time soon. Like ZZartin mentions, though... I wish people would stop using both for bulk transfer of data and I wish people would stop using them for data storage in databases. Any supposed "flexibility" there is normally and seriously outweighed by the tag bloat alone especially when a given set of data has been the same since inception years ago. It would take little imagination to pass a file with the first row being column headers, the second row with meta-data for each column, and then the data itself with things like 4 byte integers and 3 byte dates (for example). You could also avoid type-able characters for delimiters by using the appropriate control characters. I find that human are much to interested in passing human-readable forms instead of machine readable forms.

Like someone once said, can we please stop passing the printed output from spreadsheets wrapped in double-repeating column names?

The GOOD part about tagged and markup data is that they have created the need for super fast transmission hardware and that has gone crazy in the last decade. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 11

I find it odd that they would build such encoding into XML without building in some "old fashioned" coding to handle things like integers and dates without the bloat of having them appear as ASCII characters.

From a largely web based approach that's supposed to be platform independent it's not really an issue, virtually every system can handle text to whatever conversions. And given the overheard of an API call and shredding the data structure in the first place doing a few conversions is not a lot of the processing time. It's also pretty standard, I very rarely ever see native data outside of maybe database backups.

I wish people would stop using them for data storage in databases.

Lol wait until you see escaped JSON imbedded in JSON because a vendor couldn't be bothered to properly expand a field.