SHREDDING XML - Can it be done quickly?

  • I have the following combination of code, for which the declared xml variable isn't something I can influence.   However, looking into the stored procedure it's a parameter to, I'm finding a SELECT that when you take out all the other joins and selected columns, what's left is the SELECT in the code below.   Running this separately in SSMS took 21 seconds, for one lousy row.   Execution plan uses a table valued function XML reader for each column.   There has GOT to be a better (and hopefully, significantly faster) way to do this.   As I'm not real sure even what to go Googling for, I could use a little help in the shredding xml quickly department.   Here's the code:
    DECLARE @p2 AS xml =
        CONVERT(xml,
            N'<ROOT><R ID="24591988" PrimID="1003381" AuthID="1157388" ProgID="25242" ServiceCodeID="647" WorkerID="31522" Date="2017-12-01T00:00:00" StartTime="" EndTime="" RequestID="0" GPermID="0" PayRateOverride="" AutoScheduled="false"/></ROOT>'
            );

    SELECT t.col.value('@ID','BIGINT') AS ID,
        t.col.value('@PrimID','INT') AS PrimID,
        t.col.value('@AuthID','INT') AS AuthID,
        t.col.value('@ProgID','INT') AS ProgID,
        t.col.value('@ServiceCodeID','INT') AS ServiceCodeID,
        t.col.value('@WorkerID','INT') AS WorkerID,
        t.col.value('@Date','DATETIME') AS [Date],
        t.col.value('@StartTime','DATETIME') AS StartTime,
        t.col.value('@EndTime','DATETIME') AS EndTime,
        t.col.value('@RequestID','INT') AS RequestID,
        t.col.value('@GPermID','INT') AS GPermID,
        t.col.value('@PayRateOverride','VARCHAR(30)') AS PayRateOverride,
        t.col.value('@AutoScheduled','BIT') AS AutoScheduled
    FROM @p2.nodes('/ROOT/R') AS t(col)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I'm not sure what's the problem as I get those results immediately in my laptop. A different option that we use here is using OPENXML which requires sp_xml_preparedocument and sp_xml_removedocument.


    DECLARE @iDoc int;
    --Create an internal representation of the XML document.
    EXEC sp_xml_preparedocument @iDoc OUTPUT, @p2;

    SELECT *
    FROM OPENXML (@iDoc, '/ROOT/R', 1)
      WITH (
          ID     BIGINT,
          PrimID    INT,
          AuthID    INT,
          ProgID    INT,
          ServiceCodeID INT,
          WorkerID   INT,
          Date    DATETIME,
          StartTime   DATETIME,
          EndTime   DATETIME,
          RequestID   INT,
          GPermID   INT,
          PayRateOverride VARCHAR(30),
          AutoScheduled BIT);

    --Release XML document
    EXEC sp_xml_removedocument @iDoc;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • This is running much slower for you than it should. Your query is instantaneous on my laptop as well. I even created 10K rows of sample data and tested it. This query takes a couple seconds on my laptop:

    declare @table table (someXML xml);
    insert @table
    select top (10000) replace(cast(@p2 as varchar(1000)),'1003381', user_type_id)
    from sys.all_columns;

    SELECT t.col.value('@ID','BIGINT') AS ID,
      t.col.value('@PrimID','INT') AS PrimID,
      t.col.value('@AuthID','INT') AS AuthID,
      t.col.value('@ProgID','INT') AS ProgID,
      t.col.value('@ServiceCodeID','INT') AS ServiceCodeID,
      t.col.value('@WorkerID','INT') AS WorkerID,
      t.col.value('@Date','DATETIME') AS [Date],
      t.col.value('@StartTime','DATETIME') AS StartTime,
      t.col.value('@EndTime','DATETIME') AS EndTime,
      t.col.value('@RequestID','INT') AS RequestID,
      t.col.value('@GPermID','INT') AS GPermID,
      t.col.value('@PayRateOverride','VARCHAR(30)') AS PayRateOverride,
      t.col.value('@AutoScheduled','BIT') AS AutoScheduled
    FROM @table x
    cross apply x.someXML.nodes('/ROOT/R') AS t(col);

    T-SQL XML queries against XML are going to be slower than ones against tables no matter what but not 21 seconds a row. As Luis mentioned sp_xml_preparedocument, I too, have had some success with that. 

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thanks Luis.   As it turns out, it had to be either the server, or my copy of SSMS.   One or the other was mucked up.   I later ended up crashing SSMS, and it's been working fine in about a quarter of a second (EDIT: for 1,000 executions) ever since.   I tried using Jeff's DelimitedSplit8K at first, but the performance just wasn't there, and probably because I used the same split twice after splitting once to begin with.   I hate testing on shared servers when performance testing is involved.   You never know what you're up against...

     I ended up just using that original boiled down XML shred going into a temp table.  Turns out the doc prep execs cost an extra 0.175 seconds on 1,000 executions, so I'll have to not use that on this occasion.   Thanks again for creating a sounding board.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply