How to uniquely number parent and child nodes while reading an xml document

Question

Post reply

How to uniquely number parent and child nodes while reading an xml document

R.P.Rozema

SSChampion

Points: 12317
More actions
May 23, 2012 at 4:11 am

#274420

I found a neat trick that I have not seen applied anywhere, nor did I find any documentation for it. Yet it works on SQL server 2008, SQL Server 2008 R2 and SQL server 2005. I currently don't have the time to write up an article on this, but I still want to share it with all of you, so I'll just post it now so I can hear your ideas on it.
The problem I often face is that I have to read an xml document having some parent-child structure in it and I need to insert both the parent and the child nodes into tables, plus the relationship between the two into another table. All solutions I've encountered and could come up with so far involved either reading the xml multiple times (once to find the parents, once to find the children per parent plus once more to find the relationships) or reading the parents into a temp table, including for each parent an xml fragment containing the child nodes for that parent. Each of these methods is very labor intensive (for me and for sql server). I've now found that on the window functions (row_number() etc) you can also partition and order by on the name of the nodes() output.
Like this:
declare @xml xml = N'
<parent name="p1">
<child name="c1" />
<child name="c2" />
<child name="c3" />
</parent>
<parent name="p2">
<child name="c4" />
<child name="c5" />
</parent>';
select dense_rank() over (order by p.p) as parentID,
p.p.value('@name','varchar(10)') as parentName,
row_number() over (partition by p.p order by c.c) as childID,
c.c.value('@name','varchar(10)') as childName
from @xml.nodes('/parent') p(p)
cross apply p.p.nodes('child') c(c)
Which produces:
parentIDparentNamechildIDchildName
1p11c1
1p12c2
1p13c3
2p21c4
2p22c5
Each parent node is assigned a unique number -based on the node's location in the xml file- and every child node is also assigned a unique number within it's parent node -again based on the child's location within the parent node-.
As I said: I did not find any documentation describing this feature yet, so use it at your own risk. But I thought it was so cool that I still wanted to share this with you.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26979 More actions · Answer 1

You can also use preceding-sibling for this

select p.p.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID,

p.p.value('@name','varchar(10)') as parentName,

c.c.value('1+count(for $a in . return $a/../*[. << $a])','int') as childID,

c.c.value('@name','varchar(10)') as childName

from @xml.nodes('/parent') p(p)

cross apply p.p.nodes('child') c(c)

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

R.P.Rozema SSChampion Points: 12317 More actions · Answer 2

Mark-101232 (5/23/2012)
You can also use preceding-sibling for this
select p.p.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID,
p.p.value('@name','varchar(10)') as parentName,
c.c.value('1+count(for $a in . return $a/../*[. << $a])','int') as childID,
c.c.value('@name','varchar(10)') as childName
from @xml.nodes('/parent') p(p)
cross apply p.p.nodes('child') c(c)

Yes, you can. But do you want to? I personally am not very familiar with the xpath syntax you've used there, so those xpath queries you've put in there seem like a lot of hocus pocus to me. What I do see is that they have a count() operator in them. Counting usually involves having to read entries multiple times, so I expect your query to perform worse than mine (I even expect the difference to be bigger if the number of child nodes grows). Let's do a quick test. There are much better tests possible, but for a quick first comparison "set statistics io, time on" usually gives a reasonable indication:

declare @xml xml;

with cte10 as (

select 1 as n union all select 1 union all select 1 union all select 1 union all select 1

union all select 1 union all select 1 union all select 1 union all select 1 union all select 1

)

select @xml = (

select top 1000 convert(varchar(36), n.n) + name as [@name],

(

select col.name as [@name]

from master.sys.columns col

where col.object_id = tbl.object_id

for xml path('child'), type

)

from cte10 n

cross join master.sys.tables tbl

for xml path('parent'), type

);

if object_id('tempdb..#tmp') is not null

drop table #tmp;

if object_id('tempdb..#tmp2') is not null

drop table #tmp2;

print '================= R.P.Rozema ===================='

set statistics io, time on;

select dense_rank() over (order by p.p) as parentID,

p.p.value('@name','nvarchar(128)') as parentName,

row_number() over (partition by p.p order by c.c) as childID,

c.c.value('@name','nvarchar(128)') as childName

into #tmp

from @xml.nodes('/parent') p(p)

cross apply p.p.nodes('child') c(c);

set statistics io, time off;

print '================= Mark-101232 ===================='

set statistics io, time on;

select p.p.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID,

p.p.value('@name','nvarchar(128)') as parentName,

c.c.value('1+count(for $a in . return $a/../*[. << $a])','int') as childID,

c.c.value('@name','nvarchar(128)') as childName

into #tmp2

from @xml.nodes('/parent') p(p)

cross apply p.p.nodes('child') c(c)

set statistics io, time off;

The output on my machine:

================= R.P.Rozema ====================

SQL Server Execution Times:

CPU time = 109 ms, elapsed time = 103 ms.

(500 row(s) affected)

================= Mark-101232 ====================

SQL Server Execution Times:

CPU time = 780 ms, elapsed time = 986 ms.

(500 row(s) affected)

I think I'll stick with my method for now.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

R.P.Rozema SSChampion Points: 12317 More actions · Answer 3

A blog from Adam Machanic in 2009 having the same idea: http://sqlblog.com/blogs/adam_machanic/archive/2009/08/03/uniquely-identifying-xml-nodes-with-dense-rank.aspx

It's not really documentation, but at least more people describe the same functionality.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

How to uniquely number parent and child nodes while reading an xml document

Cookies on SQLServerCentral