Speeding up Cross Apply on XML

  • I am using Cross Apply to flatten an XML column that I have in a table. E.G.

    ITEM
    ---------------------
    ItemCode
    XMLColors

    The XMLColors column stores xml data about color variations. E.G.

    <ColorOptions>
     <Color>
         <Hue>Red</Hue>
         <Price>$5.00</Price>     
      </Color>
     <Color>
         <Hue>Blue</Hue>
         <Price>$5.00</Price>     
      </Color>
    </ColorOptions>

    My SQL statment is:

    SELECT
    I.ItemCode,
    tbl.col.value('(Hue)[1]', 'varchar(50)') As Hue,
    tbl.col.value('(Price)[1]', 'decimal(18,2)') As Price
    FROM ITEM I
    CROSS APPLY XMLColors.nodes('/ColorOptions/Color') As tbl(col)

    This gives me:

    ItemCode, Hue, Price
    -----------------------------------
    ABC123, Red, 5.00
    ABC123, Blue, 5.00

    There is only one record for ABC123, and the XML col is used to store the variations. I have simplified the example above but the production table gets around 12 columns from the XML, has ~ 85k ItemCodes which ends up with close to 1 million color combinations. The query takes around 15 seconds to run. Is there a way to speed this up?

  • The best thing you can do is, if possible, lose the XML column, get this data into 3NF then index accordingly. If this is not an option then consider adding primary and secondary XML indexes to speed things up. 

    Note the sample code below  and compare the actual execution plans to better understand why 3NF is the way to go:
    USE tempdb
    GO

    IF object_id('dbo.item') IS NOT NULL DROP TABLE dbo.item;
    IF object_id('dbo.item2') IS NOT NULL DROP TABLE dbo.item2;
    GO
    CREATE TABLE dbo.item (itemCode varchar(10) primary key clustered, xmlColors xml not null);
    CREATE TABLE dbo.item2
    (
    itemId int identity not null,
    itemCode varchar(10) not null,
    hue varchar(20) not null,
        price money not null,
        constraint pk_item2 unique clustered(itemCode, itemId)
    );

    -- Populate dbo.item
    INSERT dbo.item
    VALUES ('abc123',
    '<ColorOptions>
    <Color>
      <Hue>Red</Hue>
      <Price>$5.00</Price> 
    </Color>
    <Color>
      <Hue>Blue</Hue>
      <Price>$5.00</Price> 
    </Color>
    </ColorOptions>');
    GO

    -- Populate dbo.item2
    INSERT dbo.item2
    SELECT
    I.ItemCode,
    tbl.col.value('(Hue)[1]', 'varchar(50)') As Hue,
    tbl.col.value('(Price)[1]', 'money') As Price -- did not work with decimal data type
    FROM dbo.ITEM I
    CROSS APPLY XMLColors.nodes('/ColorOptions/Color') As tbl(col);
    GO

    -- Compare Execution plans
    SELECT
    I.ItemCode,
    tbl.col.value('(Hue)[1]', 'varchar(50)') As Hue,
    tbl.col.value('(Price)[1]', 'money') As Price -- did not work with decimal data type
    FROM dbo.ITEM I
    CROSS APPLY XMLColors.nodes('/ColorOptions/Color') As tbl(col);

    SELECT itemCode, hue, price FROM dbo.item2;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply