Extract the titles from XML

  • Comments posted to this topic are about the item Extract the titles from XML

  • Good Question, thanx.

    Thanks & Best Regards,
    Hany Helmy
    SQL Server Database Consultant

  • Er, all which three books of the four that are in the data .... ?

  • This was removed by the editor as SPAM

  • The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".

    As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.

    People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And here I thought I'd be the first one to comment that I don't like XML. πŸ˜€

  • Jeff Moden (7/13/2015)


    The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".

    As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.

    People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.

    +100. Or maybe I mean + a googol.

    The only reason to use XML is that one has some data whose structure can't reasonably and efficiently be represented as a table in an RDBMS but can be represented efficiently in XML. When the XML representation of the data is a tree with data only at the leaf nodes and every leaf node has the same set of attributes it is clearly trivial, not just reasonable and efficient, to represent the data in the leaf nodes as rows in a relational table whose column names match teh attribute names for thos leaf nodes. It is also extremely clear that any XML representation of that data is grossly inefficient both in storage space and in processing cost. So the data used in this QotD is a perfect exanple of data for which XML representation shouldn't even be considered, let alone used.

    Nevertheless, it is a nice easy question. Very pleasant to get two whole points for remembering "start at the ground and work down" (or "start at the ground and work up").

    edit: I see that more than half of the first seven comments are from people who have not fallen for the XML hype. I wonder if that'll stay above 50% as more comments come in.

    Tom

  • Ed Wagner (7/13/2015)


    And here I thought I'd be the first one to comment that I don't like XML. πŸ˜€

    Nope, lots of people hate XML. (raises hand)

    I messed it up; didn't pay close enough attention to the XML string.

  • TomThomson (7/13/2015)


    Jeff Moden (7/13/2015)


    The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".

    As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.

    People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.

    +100. Or maybe I mean + a googol.

    The only reason to use XML is that one has some data whose structure can't reasonably and efficiently be represented as a table in an RDBMS but can be represented efficiently in XML. When the XML representation of the data is a tree with data only at the leaf nodes and every leaf node has the same set of attributes it is clearly trivial, not just reasonable and efficient, to represent the data in the leaf nodes as rows in a relational table whose column names match teh attribute names for thos leaf nodes. It is also extremely clear that any XML representation of that data is grossly inefficient both in storage space and in processing cost. So the data used in this QotD is a perfect exanple of data for which XML representation shouldn't even be considered, let alone used.

    Nevertheless, it is a nice easy question. Very pleasant to get two whole points for remembering "start at the ground and work down" (or "start at the ground and work up").

    edit: I see that more than half of the first seven comments are from people who have not fallen for the XML hype. I wonder if that'll stay above 50% as more comments come in.

    Not a fan here either.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • TomThomson (7/13/2015)


    Jeff Moden (7/13/2015)


    The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".

    As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.

    People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.

    +100. Or maybe I mean + a googol.

    [font="Arial Black"]The only reason to use XML is that one has some data whose structure can't reasonably and efficiently be represented as a table in an RDBMS but can be represented efficiently in XML.[/font] When the XML representation of the data is a tree with data only at the leaf nodes and every leaf node has the same set of attributes it is clearly trivial, not just reasonable and efficient, to represent the data in the leaf nodes as rows in a relational table whose column names match teh attribute names for thos leaf nodes. It is also extremely clear that any XML representation of that data is grossly inefficient both in storage space and in processing cost. So the data used in this QotD is a perfect exanple of data for which XML representation shouldn't even be considered, let alone used.

    Nevertheless, it is a nice easy question. Very pleasant to get two whole points for remembering "start at the ground and work down" (or "start at the ground and work up").

    edit: I see that more than half of the first seven comments are from people who have not fallen for the XML hype. I wonder if that'll stay above 50% as more comments come in.

    I absolutely agree with all the above. The only thing is (maybe due to my current state of caffeine deprecation and the fact that I've been up all night), I can't actually envision any data that will ultimately be targeted for an RDBMS that can't easily be represented is some other form, any and all of which would likely have at least the efficiency gains that I posted in my first email.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • try './/Book'

    --------------------------------------
    ;-)β€œEverything has beauty, but not everyone sees it.” ― Confucius

  • chgn01 (7/13/2015)


    try './/Book'

    Will work but it's far more expensive to traverse the structure than to do a direct reference.

    😎

  • Jeff Moden (7/13/2015)


    The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".

    As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.

    People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.

    Shall we set up some tests to see how these two actually compare?

    😎

  • Nice Question. I scored the maximum. πŸ™‚

    Thanks.

  • I guess this is the preferred form.

    SELECT

    col.value('(Title/text())[1]', 'varchar(50)') AS 'Book'

    FROM @x.nodes('/root/Books/Book') a(col)

    Using text() in the values clause removes one call to a table valued function and the UDX operator that is there to take care of mixed content XML.

    This is faster still if you only want the Title node.

    SELECT

    col.value('.', 'varchar(50)') AS 'Book'

    FROM @x.nodes('/root/Books/Book/Title/text()') a(col)

    There is no difference in query plan or performance between using //Book .//Book or /root/Books/Book in the nodes function.

Viewing 15 posts - 1 through 15 (of 47 total)

You must be logged in to reply to this topic. Login to reply