XML Shred to tabular data

  • Hi, looking for tSQL help on shredding the following xml so that I can see (in this example) all skills per employee

    '<employees>

    <employee>

    <emp_id>1</emp_id>

    <emp_name>Bob</emp_name>

    <skills>

    <skills_id>1</skills_id>

    <skills_name>tsql</skills_id>

    <skills_id>2</skills_id>

    <skills_name>SSRS</skills_id>

    <skills_id>3</skills_id>

    <skills_name>SSAS</skills_id>

    <skills_id>4</skills_id>

    <skills_name>SSIS</skills_id>

    <skills_id>5</skills_id>

    <skills_name>Replication</skills_id>

    </skills>

    </employee>

    <employee>

    <emp_id>2</emp_id>

    <emp_name>Frank</emp_name>

    <skills>

    <skills_id>1</skills_id>

    <skills_name>tsql</skills_id>

    <skills_id>2</skills_id>

    <skills_name>SSRS</skills_id>

    <skills_id>3</skills_id>

    <skills_name>SSAS</skills_id>

    </skills>

    </employee>

    </employees>

    '

    Thanks

    *edited to show skill_id AND skill_name (not just skill_id)

  • Here is a quick example to shred that xml to a flat table:

    DECLARE @xml XML

    SET @xml = '<employees>

    <employee>

    <emp_id>1</emp_id>

    <emp_name>Bob</emp_name>

    <skills>

    <skills_id>1</skills_id>

    <skills_id>tsql</skills_id>

    <skills_id>2</skills_id>

    <skills_id>SSRS</skills_id>

    <skills_id>3</skills_id>

    <skills_id>SSAS</skills_id>

    <skills_id>4</skills_id>

    <skills_id>SSIS</skills_id>

    <skills_id>5</skills_id>

    <skills_id>Replication</skills_id>

    </skills>

    </employee>

    <employee>

    <emp_id>2</emp_id>

    <emp_name>Frank</emp_name>

    <skills>

    <skills_id>1</skills_id>

    <skills_id>tsql</skills_id>

    <skills_id>2</skills_id>

    <skills_id>SSRS</skills_id>

    <skills_id>3</skills_id>

    <skills_id>SSAS</skills_id>

    </skills>

    </employee>

    </employees>

    '

    SELECT e.c.value('(emp_id)[1]', 'int') AS emp_id

    , e.c.value('(emp_name/text())[1]', 'varchar(50)') AS emp_name

    , s.c.value('(.)[1]', 'varchar(50)') AS skills_id

    FROM @xml.nodes('/employees/employee') e(c)

    CROSS APPLY e.c.nodes('skills/skills_id') AS s(c)

    Not sure if it is going to be what you are after though looking at that data. Did you mean to have the skills id and the skill name in xml nodes of the same name? as by looking at the data i'm guessing you may would like to have it so the skills id is in a column and the skill name is also in a column and both related to each other rather than both in a single column.

  • Hi, thanks for taking time to look at that. Ideally I'd like to see....

    Emp_id Skill_ID Skill_Name

    1 1 tSQL

    1 2 SSRS

    1 3 SSAS

    1 4 SSIS

    1 5 Replication

    2 1 tSQL

    2 2 SSRS

    etc

  • NickBalaam (3/1/2013)


    Hi, thanks for taking time to look at that. Ideally I'd like to see....

    Emp_id Skill_ID Skill_Name

    1 1 tSQL

    1 2 SSRS

    1 3 SSAS

    1 4 SSIS

    1 5 Replication

    2 1 tSQL

    2 2 SSRS

    etc

    Just a heads up on that. The XML you were provided is "poorly formed" in that, except by position in the file, there's no logical manner to associate a particular skill ID with the correct name. You can't rely on the position in the file for these sorts of things because it's just not reliable.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the heads up. Unfortunately this is not our own XML, it's comes to us from a 3rd party and we have no control over the structure. Ideally it would have had something like <skill id=1>tSQL</skill> but I can't really do much about it 🙁

  • As Jeff points out, the xml isn't ideal which is true, but I have come up with something that makes a lot of assumptions!. Mainly around the ordering of the xml which if changes, will break this so I give no assurances to this code 😀

    DECLARE @xml XML

    SET @xml = '<employees>

    <employee>

    <emp_id>1</emp_id>

    <emp_name>Bob</emp_name>

    <skills>

    <skills_id>1</skills_id>

    <skills_id>tsql</skills_id>

    <skills_id>2</skills_id>

    <skills_id>SSRS</skills_id>

    <skills_id>3</skills_id>

    <skills_id>SSAS</skills_id>

    <skills_id>4</skills_id>

    <skills_id>SSIS</skills_id>

    <skills_id>5</skills_id>

    <skills_id>Replication</skills_id>

    </skills>

    </employee>

    <employee>

    <emp_id>2</emp_id>

    <emp_name>Frank</emp_name>

    <skills>

    <skills_id>1</skills_id>

    <skills_id>tsql</skills_id>

    <skills_id>2</skills_id>

    <skills_id>SSRS</skills_id>

    <skills_id>3</skills_id>

    <skills_id>SSAS</skills_id>

    </skills>

    </employee>

    </employees>

    '

    SELECT @xml = @xml.query('

    <employees>

    {

    for $x in //employee

    return

    <employee emp_name="{$x/emp_name/text()}" emp_id="{$x/emp_id/text()}">

    {

    for $y in $x/skills/skills_id[number(text()[1]) > 0]

    return

    <skill id="{data($y)}" value="{data($x/skills/skills_id[. >> $y][1])}"/>

    }

    </employee>

    }

    </employees>

    ')

    SELECT e.c.value('@emp_id', 'int') AS emp_id

    , e.c.value('@emp_name', 'varchar(50)') AS emp_name

    , s.c.value('@id', 'int') AS skills_id

    , s.c.value('@value', 'varchar(50)') AS skill

    FROM @xml.nodes('/employees/employee') e(c)

    CROSS APPLY e.c.nodes('skill') AS s(c)

    Returns:

    emp_idemp_nameskills_idskill

    1Bob1tsql

    1Bob2SSRS

    1Bob3SSAS

    1Bob4SSIS

    1Bob5Replication

    2Frank1tsql

    2Frank2SSRS

    2Frank3SSAS

  • NickBalaam (3/1/2013)


    Thanks for the heads up. Unfortunately this is not our own XML, it's comes to us from a 3rd party and we have no control over the structure. Ideally it would have had something like <skill id=1>tSQL</skill> but I can't really do much about it 🙁

    Heh... actually, you can. Invite them to dinner. A nice pork chop dinner. To make it so they understand, tie them to the chair and feed them the porchops... at point blank range with a Wrist Rocket. 😀 You'll get your point across.

    If these people are providing a "service", they need to provide it correctly. If they're a customer, the need to help you help them. Since the answer is always "No" unless you ask, contact the 3rd part and tell them they're doing it wrong and you want it fixed!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, I like that idea. Haha

    Thanks both for your time. I think I can work with that.

    Regards

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply