Two Useful Hierarchy Functions

Question

Post reply

Two Useful Hierarchy Functions

John Cawley III

SSC Rookie

Points: 26
More actions
May 20, 2008 at 9:59 pm

#186791

Comments posted to this topic are about the item Two Useful Hierarchy Functions

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

dsilveira SSC Rookie Points: 32 More actions · Answer 1

dsilveira

SSC Rookie

Points: 32

May 21, 2008 at 2:47 am

#818419

very nicely done!

jeb bushell SSC Journeyman Points: 87 More actions · Answer 2

Hard to read due to poor web presentation. I'm using IE7 and the code segments are not readable unless they're cut and pasted elsewhere. I ran out of time and patience.

ddnrg SSC Rookie Points: 32 More actions · Answer 3

Does the new SQL2008 HeirarchyId pretty much replace the need for doing this?

Chris Muller-398403 Grasshopper Points: 15 More actions · Answer 4

Nice example, thanks for simplifying this complex subject. Would it have been better to make the node_id column in the table example an identity column for saving data (after a user selected one item) and reporting? What are the costs/benefits?

Also, I guess the hierarchy could change too, what if there was an org. change in that example, where Alan left and another became the start of the tree? In my application, do we want to save the pick of Alan in your hierarchy (hence that identity column idea above) or just the current starting node for the year, no matter who it is at that point in time? I guess that may depend on your application needs?

Thanks again for the nice article and great subject.

GSquared SSC Guru Points: 260824 More actions · Answer 5

I've used hierarchies with as many as 7,000 nodes and as many as 50 levels. CTEs work well for those.

But one thing that should be brought up in any discussion of hierarchies is the "nested sets" model.

The type of hierarchy defined in this article is called an "adjacency hierarchy". A nested sets hierarchy works a little different.

Here's a sample table for a nested sets hierarchy:

create table NSHierarchy (

SetStart int not null,

SetEnd int not null,

constraint PK_NSHierarchy primary key (SetStart, SetEnd),

Name varchar(100))

The way it works is that the SetStart column and the SetEnd column define a range of values, and lower levels of the hierarchy are given values between those.

For example:

insert into dbo.NSHierarchy (SetStart, SetEnd, Name)

select 1, 10, 'Mandy' union all

select 2, 5, 'Bob' union all

select 6, 9, 'Sue' union all

select 3, 3, 'Doug' union all

select 4, 4, 'Sam' union all

select 7, 7, 'Terry' union all

select 8, 8, 'Sal'

In this example, Mandy is the top level of the hierarchy. Bob and Sue report directly to Mandy, since their numbers are inside her range (anything between 1 and 10 is in her hierarchy). Doug and Sam report to Bob (their numbers are between 2 and 5, which are Bob's numbers), while Sam, Terry and Sal report to Sue (between 6 and 9).

This is queried as:

create proc HierarchySelect

(@TopNode_in int)

as

declare @SEnd int

select @SEnd = SetEnd

from dbo.NSHierarchy

where SetStart = @TopNode_in

select Name

from dbo.NSHierarchy

where SetStart between @TopNode_in and @SEnd

and SetEnd between @TopNode_in and @SEnd

(Variations on this to deal with the name as an input parameter, or to allow a level to have two parents (SetStart in one range, SetEnd in another), etc., are pretty easy to extrapolate.)

Joe Celko has a much more complete and comprehensive description of this on his site and in his books. I won't try to go into that level of detail, because he's already done a better job of it than I will.

The advantage to this is that it's MUCH faster than an adjacency hierarchy.

For example, that 7,000-node, 50-level hierarchy, takes almost a second to run (on my test machine) using a CTE, exactly as described in the article. Not bad, but it also takes 7,000 scans (IO). That's a LOT of IO. With a nested sets hierarchy, it takes less than a millisecond to run, and takes 2 scans.

The disadvantage of a nested sets hierarchy is that it takes more work to update/insert than an adjacency hierarchy. If the data changes a lot, it becomes much more difficult to manage.

I've solved that for my situation by using a hybrid solution (partially suggested by Jeff Moden). I have tables that looks like this:

create table Hierarchies (

HierarchyID int identity primary key,

Name varchar(100))

go

create table HierarchiesNodes (

HierarchyID int not null references dbo.Hierarchies(HierarchyID),

NodeID int identity primary key,

ParentID int null references dbo.Hierarchy(NodeID),

SetStart int null,

SetEnd int null,

SetLevel int null,

CustomerID int not null references dbo.Customers(CustomerID),

ActiveDate datetime default(getdate()),

InactiveDate datetime null)

The update, insert and delete code for the nodes table sets the SetStart and SetEnd columns to null for any affected rows. The lookup code (select), checks to see if any level that's in the same hierarchy (HierarchyID) has a null SetStart or SetEnd. If so, it uses a CTE to pull the hierarchy. If not, it uses a nested sets query to pull it. Every hour, the table is scanned for null SetStart and SetEnd values, and if any are found, those hierarchies have their set data updated. Again, Joe Celko has an article on how to do that efficiently (it's not hard to figure out if you know how to use a CTE to resolve the hierarchy in the first place).

I use the HierarchyID column as the first column in the indexes on this table. It's selective enough, in my case, that it seriously speeds up the selects. It doesn't matter for the set queries, but it does for the CTEs.

I use the SetLevel column for the nested sets queries, because unlike a recursive CTE, I can't just have it use a "Depth + 1" calculation if I want levels. It also allows me to limit the number of levels, if that's appropriate for the query at hand. Just like the HierarchyID doesn't matter to the sets, the SetLevel column doesn't matter for the CTE, but I need both for this hybrid solution.

Thus far, this gives us the best of both models.

I thought it should be mentioned, since this data matters if you deal with hierarchies much.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 6

ddnrg (5/21/2008)
Does the new SQL2008 HeirarchyId pretty much replace the need for doing this?

That one, per what I've read, is fast to query, but slow to update/insert/delete. Adjacency hierarchies (like the one described in the article) are fast to update/insert/delete and somewhat slower to query. So it might, or might not, replace it, depending on specific needs.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 7

Nice and simple, pleasure to read. Well done, John! 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Anipaul SSC-Insane Points: 24681 More actions · Answer 8

Anipaul

SSC-Insane

Points: 24681

May 21, 2008 at 8:43 am

#818613

Very well written..........

Omama Hatari SSC Rookie Points: 26 More actions · Answer 9

Got error "Invalid object name 'salaries'."

Table is not created anywhere

John Cawley III SSC Rookie Points: 26 More actions · Answer 10

I apologize, Omama -- I was just using that as an example if you had a Salaries table to subtotal against. Though not included in the article, we can create a quick simple table that will work:

create table salaries(

emp_id int not null,

salary money,

fica money

)

GO

insert into salaries values( 1, 110000, 11000 )

insert into salaries values( 2, 100000, 10000 )

insert into salaries values( 3, 70000, 7000 )

insert into salaries values( 4, 90000, 9000 )

insert into salaries values( 5, 95000, 9500 )

insert into salaries values( 6, 60000, 6000 )

insert into salaries values( 7, 75000, 75000 )

insert into salaries values( 8, 50000, 50000 )

insert into salaries values( 9, 55000, 55000 )

insert into salaries values( 10, 60000, 6000 )

GO

select

sum( s.salary ) departmental_salary_total,

sum( s.fica ) as departmental_fica_total

from subordinates( 2007, 2 ) base

left join salaries s

on s.emp_id = node_id

Tod Conover Grasshopper Points: 14 More actions · Answer 11

For reporting purposes we would generally want to sort all the nodes within one parent by node name. Below is an extension to the subordinates function that returns a field suitable for this purpose. The added field is called "node_sort" so your query might look like this...

select * from subordinates(2007,1) order by node_sort, node_name

----- create Subordinates function using CTE with added sort field

if object_id( 'subordinates', 'IF' ) is not null drop function subordinates

GO

create function subordinates( @year int, @node_id int ) returns table as return

with subnodes( distance, year, node_id, parent_node, node_name, node_seq, node_sort) as

(

select 0, @year, h.node_id, h.parent_node, h.node_name,

convert( varchar(99), ltrim(str(h.node_id))) as node_seq,

convert( varchar(99), '')

from hierarchy h where h.node_id = @node_id and h.year = @year

union all

select distance+1, @year, h.node_id, h.parent_node, h.node_name,

convert( varchar(99), sn.node_seq + '.' + ltrim(str(h.node_id))),

convert( varchar(99), sn.node_sort + '.' + right('00000' + cast(isnull(h.parent_node,'0') as varchar(5)), 5) )

from hierarchy h inner join subnodes sn on h.year = @year and h.parent_node = sn.node_id

)

select distance, year, node_id, parent_node, node_name, node_seq, node_sort from subnodes

GO

José Luis R. Newbie Points: 5 More actions · Answer 12

EXCELENTE TU SUGERENCIA, TOD!!! REALMENTE EXCELENTE!!!

Enhorabuena Tod!

Jack-408956 SSC Enthusiast Points: 132 More actions · Answer 13

None of this "with" stuff is working for us on MSSQL 2000...

Server: Msg 156, Level 15, State 1, Procedure subordinates, Line 2
Incorrect syntax near the keyword 'with'.

Is it a feature of MSSQL 2005 only?

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 14

Jack-408956 (8/3/2010)
None of this "with" stuff is working for us on MSSQL 2000...
Server: Msg 156, Level 15, State 1, Procedure subordinates, Line 2
Incorrect syntax near the keyword 'with'.
Is it a feature of MSSQL 2005 only?

Yes... "WITH" is a precursor to CTE's.

If you want to do this "stuff" in 2k, do an initial insert similar to the first part (above the UNION) of the CTE. Then, do a loop that does very similar to the second part of the CTE which will have nearly the same speed as the CTE and (haven't tried it with this particular table) will likely use about 1/3rd the number of reads. The second part of the CTE (or the loop) is NOT RBAR, in this case. Each iteration of the loop loads an entire level of the hierarchy no matter how wide a given level becomes.

That's all a recursive CTE does, by the way... it does an "initial load" and then loops the same way (and with the same impact on performance) as a While Loop.

In 2k, this method will give much better utility and performance than using the classic "Expanding Hierarchies" example given in 2k Books Online.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)