Help - Shredding XML

  • I have tried and tried to post this question in the MSDN forums but keep getting an irritating error whenever I try to post, so i'm hoping I can find some help with this here.

    I have a very small XML file (it's a sample file, i'm just trying to learn how to do this right now) which i'm inserting into a table. Once there, i'm trying to query the file but get an empty result set. I've tried using Xquery and OpenXML but get the same results with each method.

    I really don't understand why this doesn't return any results. All i'm trying to do is get the book titles and the prices. I get the columns to return, but no data.

    Why?

    here's the file:

    <bookstore>

    <book>

    <title lang="eng">Harry Potter</title>

    <price>29.99</price>

    </book>

    <book>

    <title lang="eng">Learning XML</title>

    <price>39.95</price>

    </book>

    </bookstore>

    Here's the code i'm using to insert the file into a table:

    --CREATE TABLE TO STORE XML FILES

    CREATE TABLE XMLTest

    (

    IDINT IDENTITY PRIMARY KEY

    ,XMLFIleXML

    ,LoadDateDATETIME

    )

    GO

    --BULK INSERT THE XML FILE(s)

    INSERT INTO XMLTest (XMLFile, LoadDate)

    SELECT CONVERT(XML, BulkColumn) as BulkColumn, GetDate()

    FROM OPENROWSET(BULK 'C:\XML\Books.xml', SINGLE_BLOB) as X

    GO

    --SELECT THE FILES FROM THE TABLE TO ENSURE INSERT WORKED

    SELECT * FROM XMLTest

    Here's my query in OpenXML:

    DECLARE @XML as XML

    ,@HDoc as INT

    ,@SQL as VARCHAR(MAX)

    SELECT @XML = XMLFile FROM XMLTest

    EXEC sp_xml_preparedocument @HDoc OUTPUT, @XML

    SELECT *

    FROM OPENXML(@HDoc, 'Bookstore/Book/Title')

    WITH

    (

    Title[VARCHAR](50)'Title'

    ,Price[DECIMAL](5,2)'Price'

    )

    EXEC sp_xml_removedocument @HDoc

    GO

    Can someone please explain how I can get values the book title and price? I have a real work project coming up in which I have to shred a larger XML file, however I want to get the hang of doing this on a smaller file first.

    What am I doing wrong?

    Thanks!!

  • Here's one way. Not sure it's the best way and it doesn't use OPENXML though:

    DECLARE @Books XML = '

    <bookstore>

    <book>

    <title lang="eng">Harry Potter</title>

    <price>29.99</price>

    </book>

    <book>

    <title lang="fre">Learning XML</title>

    <price>39.95</price>

    </book>

    </bookstore>'

    SELECT title

    ,[language]=(

    SELECT TOP 1 z.value('@lang[1]', 'VARCHAR(1000)')

    FROM x.nodes('bookstore/book/title') c(z)

    WHERE title = z.value('(.)[1]', 'VARCHAR(1000)')

    )

    ,price=y.value('(price)[1]', 'MONEY')

    FROM (SELECT @Books) a(x)

    CROSS APPLY x.nodes('bookstore/book') b(y)

    CROSS APPLY (SELECT title=y.value('(title)[1]', 'VARCHAR(1000)')) c

    Note the TOP 1 on the correlated sub-query is there in case the same book title appears with multiple languages. In that case, the sub-query won't return the correct result but at least won't fail. To correct a case like that, you'd need to have something unique in the correlation like ISBN.

    Edit: Added the second CROSS APPLY to clean it up a little.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Perhaps this procedure will help.

    Parse XML to Table[/url].

    If not a direct help, the underlying methods may be useful in some other way.

    You should be able to use your sample XML for an easy test and to view the output (which shreds the nodes and attributes to a table}.

  • In XQuery (which is what you should use):

    DECLARE @x xml = '<bookstore>

    <book>

    <title lang="eng">Harry Potter</title>

    <price>29.99</price>

    </book>

    <book>

    <title lang="eng">Learning XML</title>

    <price>39.95</price>

    </book>

    </bookstore>'

    SELECT Title = T.c.value('(./text())[1]', 'varchar(29)'),

    Lang = T.c.value('@lang', 'varchar(23)'),

    Price = B.c.value('(price/text())[1]', 'decimal(10,2)')

    FROM @x.nodes('bookstore/book') AS B(c)

    CROSS APPLY B.c.nodes('title') AS T(c)

    To get an attribute, just use @attr. Element values are more clumsy, because you need this /text() because of performance, and [1] to avoid error messages. But this is the general pattern. Dig one level with APPLY nodes you want to use. (And never use parent-axis, .., because performance sucks in this case.)

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Erland Sommarskog (7/26/2013)


    In XQuery (which is what you should use):

    DECLARE @x xml = '<bookstore>

    <book>

    <title lang="eng">Harry Potter</title>

    <price>29.99</price>

    </book>

    <book>

    <title lang="eng">Learning XML</title>

    <price>39.95</price>

    </book>

    </bookstore>'

    SELECT Title = T.c.value('(./text())[1]', 'varchar(29)'),

    Lang = T.c.value('@lang', 'varchar(23)'),

    Price = B.c.value('(price/text())[1]', 'decimal(10,2)')

    FROM @x.nodes('bookstore/book') AS B(c)

    CROSS APPLY B.c.nodes('title') AS T(c)

    To get an attribute, just use @attr. Element values are more clumsy, because you need this /text() because of performance, and [1] to avoid error messages. But this is the general pattern. Dig one level with APPLY nodes you want to use. (And never use parent-axis, .., because performance sucks in this case.)

    I was having trouble digging out the lang attribute and that's why I resorted to that silly subquery. Next time I'll have to remember to drop the [1].


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • DECLARE@Sample TABLE

    (

    Data XML NOT NULL

    );

    INSERT@Sample

    (

    Data

    )

    VALUES(N'<bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>');

    -- SwePeso

    SELECTb.n.value('(title)[1]', 'NVARCHAR(MAX)') AS Title,

    b.n.value('(title/@lang)[1]', 'NVARCHAR(MAX)') AS Lang,

    b.n.value('(price)[1]', 'MONEY') AS Price

    FROM@Sample AS s

    CROSS APPLYs.Data.nodes('(bookstore/book)') AS b(n);


    N 56°04'39.16"
    E 12°55'05.25"

  • Thanks for all of the help. I have it working now and using the actual client file which is all parsed out the way I want it. Now I just have to figure out how to automate all of this in SSIS.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply