Extract based on a similar element name and store the value in the same column

  • I need to parse xml data using xquery. I have tried my best to replicate the situation. I need to look for similar looking element name and store the value in the same column.The similar elements could go from 1 to n.

    Below is the xml and the expected output.

    <ProductCategory>1

    <Dairy1>Milk</Dairy1>

    <Dairy1>yougurt</Dairy1>

    <Dairy2>Cheese</Dairy2

    <Dairy3>Butter </Dairy3>

    <Dairy>CreamCheese </Dairy>

    <Fruit>Apple </Fruit>

    <Fruit>Banana </Fruit>

    <Fruit1>Orange </Fruit1>

    <Fruit1>GrapeFruit</Fruit1>

    <Fruit1>Garpes</Fruit1>

    <Fruit2>Strawberry</Fruit2>

    <Vegetable>Onion </Vegetable>

    <others>Walnut</others>

    </ProductCategory>

    ProductCategory Dairy Fruit Vegetable Others

    1 Milk apple onion walnut

    1 yogurtBanana NULL NULL

    1 chesse orange NULL NULL

    1butter grapefruit NULL NULL

    1creamcheese grapes NULL NULL

    1 NULL Strawberry NULL NULL

  • Looking at the XML and the expected results, it is apparent that something is missing in the description;-)

    The code below is slightly verbose in order to be self explanatory, note that if column names are dynamic then the whole query has to be dynamic.

    This should at least get you started.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @TXML XML = '<ProductCategory>1

    <Dairy1>Milk</Dairy1>

    <Dairy1>yougurt</Dairy1>

    <Dairy2>Cheese</Dairy2>

    <Dairy3>Butter </Dairy3>

    <Dairy>CreamCheese </Dairy>

    <Fruit>Apple </Fruit>

    <Fruit>Banana </Fruit>

    <Fruit1>Orange </Fruit1>

    <Fruit1>GrapeFruit</Fruit1>

    <Fruit1>Garpes</Fruit1>

    <Fruit2>Strawberry</Fruit2>

    <Vegetable>Onion </Vegetable>

    <others>Walnut</others>

    </ProductCategory>'

    ;

    ;WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY (SELECT NULL)

    ORDER BY (SELECT NULL)

    ) AS ELEMENT_RID

    ,PRODUCTCATEGORY.DATA.value('text()[1]', 'VARCHAR(10)') AS PRODCAT_VAL

    ,ELEMENT.DATA.value('local-name(.)', 'VARCHAR(50)') AS ELEMENT_NAME

    ,ELEMENT.DATA.value('.[1]', 'VARCHAR(100)') AS ELEMENT_VALUE

    FROM @TXML.nodes('ProductCategory') AS PRODUCTCATEGORY(DATA)

    OUTER APPLY PRODUCTCATEGORY.DATA.nodes('*') AS ELEMENT(DATA)

    )

    ,MULTI_GROUP AS

    (

    SELECT

    BD.ELEMENT_RID

    ,BD.ELEMENT_NAME

    ,BE.ELEMENT_NAME AS GROUP_NAME

    FROM BASE_DATA BD

    CROSS APPLY BASE_DATA BE

    WHERE BD.ELEMENT_NAME <> BE.ELEMENT_NAME

    AND BD.ELEMENT_NAME LIKE BE.ELEMENT_NAME + '%'

    )

    ,ALL_GROUPS AS

    (

    SELECT DISTINCT

    BD.ELEMENT_NAME AS GROUP_NAME

    FROM BASE_DATA BD

    WHERE BD.ELEMENT_NAME NOT IN

    (

    SELECT

    MG.ELEMENT_NAME

    FROM MULTI_GROUP MG

    )

    )

    ,NUMBERED_GROUPS AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    PARTITION BY (SELECT NULL)

    ORDER BY (SELECT NULL)

    ) AS AG_RID

    ,AG.GROUP_NAME

    FROM ALL_GROUPS AG

    )

    ,COL_ROW_ENUM_SET AS

    (

    SELECT

    BD.ELEMENT_RID

    ,BD.PRODCAT_VAL

    ,NG.GROUP_NAME

    ,NG.AG_RID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY NG.GROUP_NAME

    ORDER BY BD.ELEMENT_RID

    ) AS EL_GR_RID

    ,BD.ELEMENT_VALUE

    FROM BASE_DATA BD

    OUTER APPLY NUMBERED_GROUPS NG

    WHERE CHARINDEX(NG.GROUP_NAME, BD.ELEMENT_NAME,1) = 1

    )

    ,ROWS_AND_COLS AS

    (

    SELECT DISTINCT

    NG.AG_RID AS COL_NO

    ,CRES.EL_GR_RID AS ROW_NO

    FROM NUMBERED_GROUPS NG

    OUTER APPLY COL_ROW_ENUM_SET CRES

    )

    SELECT

    MAX(CRES.PRODCAT_VAL) AS ProductCategory

    ,MAX(CASE WHEN RAC.COL_NO = 1 THEN CRES.ELEMENT_VALUE END) AS COL_01

    ,MAX(CASE WHEN RAC.COL_NO = 2 THEN CRES.ELEMENT_VALUE END) AS COL_02

    ,MAX(CASE WHEN RAC.COL_NO = 3 THEN CRES.ELEMENT_VALUE END) AS COL_03

    ,MAX(CASE WHEN RAC.COL_NO = 4 THEN CRES.ELEMENT_VALUE END) AS COL_04

    ,MAX(CASE WHEN RAC.COL_NO = 5 THEN CRES.ELEMENT_VALUE END) AS COL_05

    ,MAX(CASE WHEN RAC.COL_NO = 6 THEN CRES.ELEMENT_VALUE END) AS COL_06

    FROM ROWS_AND_COLS RAC

    LEFT OUTER JOIN COL_ROW_ENUM_SET CRES

    ON RAC.COL_NO = CRES.AG_RID

    AND RAC.ROW_NO = CRES.EL_GR_RID

    GROUP BY RAC.ROW_NO;

    Results

    ProductCategory COL_01 COL_02 COL_03 COL_04 COL_05 COL_06

    --------------- ------------ ----------- -------- -------- -------- -------

    1 Milk Apple Walnut Onion NULL NULL

    1 yougurt Banana NULL NULL NULL NULL

    1 Cheese Orange NULL NULL NULL NULL

    1 Butter GrapeFruit NULL NULL NULL NULL

    1 CreamCheese Garpes NULL NULL NULL NULL

    1 NULL Strawberry NULL NULL NULL NULL

  • I know that this thread is a week old and Eirikur has already provided a potential solution but I'd thought I'd add some comments + another solution as well. Firstly the XML you are working with is pretty poor in structure and if that is indeed what you have to work with, I'd look to resolve the schema of that first otherwise you are always going to be fighting the xml with some creative querying. Anyway here is another way of approaching the problem by shredding the xml and then using a pivot. This is too based on some assumptions on the xml schema design.

    DECLARE @xml XML

    SET @xml = '<ProductCategory>1

    <Dairy1>Milk</Dairy1>

    <Dairy1>yougurt</Dairy1>

    <Dairy2>Cheese</Dairy2>

    <Dairy3>Butter </Dairy3>

    <Dairy>CreamCheese </Dairy>

    <Fruit>Apple </Fruit>

    <Fruit>Banana </Fruit>

    <Fruit1>Orange </Fruit1>

    <Fruit1>GrapeFruit</Fruit1>

    <Fruit1>Garpes</Fruit1>

    <Fruit2>Strawberry</Fruit2>

    <Vegetable>Onion </Vegetable>

    <others>Walnut</others>

    </ProductCategory>';

    WITH x AS

    (

    SELECT PC.c.value('text()[1]', 'varchar(10)') AS 'ProductCategory'

    , d.c.value('upper-case(substring(local-name(.),1,1))', 'char(1)') AS 'ProductType'

    , d.c.value('text()[1]', 'varchar(100)') AS 'ProductValue'

    FROM @xml.nodes('/ProductCategory') PC(c)

    CROSS APPLY PC.c.nodes('*') d(c)

    )

    SELECT ProductCategory, [D] AS 'Dairy', [F] AS 'Fruit', [V] AS 'Vegtables', [O] AS 'Others'

    FROM

    (

    SELECT ROW_NUMBER() OVER (PARTITION BY ProductCategory, ProductType ORDER BY (SELECT NULL)) AS RowId, *

    FROM x

    ) AS Src

    PIVOT

    (

    MAX(ProductValue)

    FOR ProductType IN ([D], [F], [O], [V])

    ) AS Pvt

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply