How to uniquely number parent and child nodes while reading an xml document

  • I found a neat trick that I have not seen applied anywhere, nor did I find any documentation for it. Yet it works on SQL server 2008, SQL Server 2008 R2 and SQL server 2005. I currently don't have the time to write up an article on this, but I still want to share it with all of you, so I'll just post it now so I can hear your ideas on it.

    The problem I often face is that I have to read an xml document having some parent-child structure in it and I need to insert both the parent and the child nodes into tables, plus the relationship between the two into another table. All solutions I've encountered and could come up with so far involved either reading the xml multiple times (once to find the parents, once to find the children per parent plus once more to find the relationships) or reading the parents into a temp table, including for each parent an xml fragment containing the child nodes for that parent. Each of these methods is very labor intensive (for me and for sql server). I've now found that on the window functions (row_number() etc) you can also partition and order by on the name of the nodes() output.

    Like this:

    declare @xml xml = N'

    <parent name="p1">

    <child name="c1" />

    <child name="c2" />

    <child name="c3" />

    </parent>

    <parent name="p2">

    <child name="c4" />

    <child name="c5" />

    </parent>';

    select dense_rank() over (order by p.p) as parentID,

    p.p.value('@name','varchar(10)') as parentName,

    row_number() over (partition by p.p order by c.c) as childID,

    c.c.value('@name','varchar(10)') as childName

    from @xml.nodes('/parent') p(p)

    cross apply p.p.nodes('child') c(c)

    Which produces:

    parentIDparentNamechildIDchildName

    1p11c1

    1p12c2

    1p13c3

    2p21c4

    2p22c5

    Each parent node is assigned a unique number -based on the node's location in the xml file- and every child node is also assigned a unique number within it's parent node -again based on the child's location within the parent node-.

    As I said: I did not find any documentation describing this feature yet, so use it at your own risk. But I thought it was so cool that I still wanted to share this with you.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • You can also use preceding-sibling for this

    select p.p.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID,

    p.p.value('@name','varchar(10)') as parentName,

    c.c.value('1+count(for $a in . return $a/../*[. << $a])','int') as childID,

    c.c.value('@name','varchar(10)') as childName

    from @xml.nodes('/parent') p(p)

    cross apply p.p.nodes('child') c(c)

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (5/23/2012)


    You can also use preceding-sibling for this

    select p.p.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID,

    p.p.value('@name','varchar(10)') as parentName,

    c.c.value('1+count(for $a in . return $a/../*[. << $a])','int') as childID,

    c.c.value('@name','varchar(10)') as childName

    from @xml.nodes('/parent') p(p)

    cross apply p.p.nodes('child') c(c)

    Yes, you can. But do you want to? I personally am not very familiar with the xpath syntax you've used there, so those xpath queries you've put in there seem like a lot of hocus pocus to me. What I do see is that they have a count() operator in them. Counting usually involves having to read entries multiple times, so I expect your query to perform worse than mine (I even expect the difference to be bigger if the number of child nodes grows). Let's do a quick test. There are much better tests possible, but for a quick first comparison "set statistics io, time on" usually gives a reasonable indication:

    declare @xml xml;

    with cte10 as (

    select 1 as n union all select 1 union all select 1 union all select 1 union all select 1

    union all select 1 union all select 1 union all select 1 union all select 1 union all select 1

    )

    select @xml = (

    select top 1000 convert(varchar(36), n.n) + name as [@name],

    (

    select col.name as [@name]

    from master.sys.columns col

    where col.object_id = tbl.object_id

    for xml path('child'), type

    )

    from cte10 n

    cross join master.sys.tables tbl

    for xml path('parent'), type

    );

    if object_id('tempdb..#tmp') is not null

    drop table #tmp;

    if object_id('tempdb..#tmp2') is not null

    drop table #tmp2;

    print '================= R.P.Rozema ===================='

    set statistics io, time on;

    select dense_rank() over (order by p.p) as parentID,

    p.p.value('@name','nvarchar(128)') as parentName,

    row_number() over (partition by p.p order by c.c) as childID,

    c.c.value('@name','nvarchar(128)') as childName

    into #tmp

    from @xml.nodes('/parent') p(p)

    cross apply p.p.nodes('child') c(c);

    set statistics io, time off;

    print '================= Mark-101232 ===================='

    set statistics io, time on;

    select p.p.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID,

    p.p.value('@name','nvarchar(128)') as parentName,

    c.c.value('1+count(for $a in . return $a/../*[. << $a])','int') as childID,

    c.c.value('@name','nvarchar(128)') as childName

    into #tmp2

    from @xml.nodes('/parent') p(p)

    cross apply p.p.nodes('child') c(c)

    set statistics io, time off;

    The output on my machine:

    ================= R.P.Rozema ====================

    SQL Server Execution Times:

    CPU time = 109 ms, elapsed time = 103 ms.

    (500 row(s) affected)

    ================= Mark-101232 ====================

    SQL Server Execution Times:

    CPU time = 780 ms, elapsed time = 986 ms.

    (500 row(s) affected)

    I think I'll stick with my method for now.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • A blog from Adam Machanic in 2009 having the same idea: http://sqlblog.com/blogs/adam_machanic/archive/2009/08/03/uniquely-identifying-xml-nodes-with-dense-rank.aspx

    It's not really documentation, but at least more people describe the same functionality.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply