Importing Strange XML File

  • Mark Derryberry (3/1/2011)


    Over 50% from my initial attempts! I was pushing about 6 hours and the methods you all proposed and some additional things I had to do given the scope. It's down to about 2 1/2 on a server. I say server because my desktop dev machine almost choked to death. 🙂

    Excellent.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Mark Derryberry (3/1/2011)


    Over 50% from my initial attempts! I was pushing about 6 hours and the methods you all proposed and some additional things I had to do given the scope. It's down to about 2 1/2 on a server. I say server because my desktop dev machine almost choked to death. 🙂

    I didn't know about the size of the file and I didn't ask for it either...

    Given this scenario I would try a slightly different approach:

    Step 1: load the xml file into a staging table with an xml column using BULK IMPORT

    Step 2: add an xml index

    Step 3: shred the xml data into another staging table (based on the "cte" part of the code provided so far)

    Step 4: index the shredded staging table

    Step 5: split the shredded data based on the "split" part of the code, most probably into yet another staging table including adding an index

    Step 6: final select using the CrossTab method instead of Pivot (since it has been shown that Pivot is slower than the CrossTab - but I don't have the reference available right now)

    An alternative route instead of using steps 1, 2 and 3 could be using OPENXML:

    DECLARE @idoc int

    DECLARE @doc varchar(max)

    SET @doc =YourLargeXMLFile

    --Create an internal representation of the XML document.

    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

    -- Execute a SELECT statement that uses the OPENXML rowset provider.

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1) ) AS row,*

    FROM OPENXML (@idoc, '/wcsxml/relation/row',1)

    WITH (col varchar(200) '../@columns',

    val varchar(200) '@values')

    I would recommend to compare the performance of each method individually (not testing the whole process at once) to see which one would perform better.

    Since both will return the identical result you could decide which one to use.

    Based on the given scenario I'd guess the OPENXML approach might be faster...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks for the great suggestions.

    I hate it when I get so wrapped around the axel I overlook some other avenues. I'm already doing some of that when I begin importing the data so I'll just expand on the idea.

Viewing 3 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply