Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Two Useful Hierarchy Functions Expand / Collapse
Author
Message
Posted Tuesday, May 20, 2008 9:59 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, February 20, 2014 10:17 AM
Points: 2, Visits: 29
Comments posted to this topic are about the item Two Useful Hierarchy Functions
Post #504189
Posted Wednesday, May 21, 2008 2:47 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 20, 2009 1:36 PM
Points: 18, Visits: 5
very nicely done!
Post #504286
Posted Wednesday, May 21, 2008 4:04 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, April 8, 2010 4:05 AM
Points: 5, Visits: 9
Hard to read due to poor web presentation. I'm using IE7 and the code segments are not readable unless they're cut and pasted elsewhere. I ran out of time and patience.
Post #504320
Posted Wednesday, May 21, 2008 5:43 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, February 11, 2013 1:10 PM
Points: 6, Visits: 22
Does the new SQL2008 HeirarchyId pretty much replace the need for doing this?
Post #504373
Posted Wednesday, May 21, 2008 6:45 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, August 16, 2010 1:35 PM
Points: 1, Visits: 10
Nice example, thanks for simplifying this complex subject. Would it have been better to make the node_id column in the table example an identity column for saving data (after a user selected one item) and reporting? What are the costs/benefits?

Also, I guess the hierarchy could change too, what if there was an org. change in that example, where Alan left and another became the start of the tree? In my application, do we want to save the pick of Alan in your hierarchy (hence that identity column idea above) or just the current starting node for the year, no matter who it is at that point in time? I guess that may depend on your application needs?

Thanks again for the nice article and great subject.
Post #504412
Posted Wednesday, May 21, 2008 7:43 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Thursday, December 18, 2014 9:58 AM
Points: 13,872, Visits: 9,600
I've used hierarchies with as many as 7,000 nodes and as many as 50 levels. CTEs work well for those.

But one thing that should be brought up in any discussion of hierarchies is the "nested sets" model.

The type of hierarchy defined in this article is called an "adjacency hierarchy". A nested sets hierarchy works a little different.

Here's a sample table for a nested sets hierarchy:

create table NSHierarchy (
SetStart int not null,
SetEnd int not null,
constraint PK_NSHierarchy primary key (SetStart, SetEnd),
Name varchar(100))

The way it works is that the SetStart column and the SetEnd column define a range of values, and lower levels of the hierarchy are given values between those.

For example:

insert into dbo.NSHierarchy (SetStart, SetEnd, Name)
select 1, 10, 'Mandy' union all
select 2, 5, 'Bob' union all
select 6, 9, 'Sue' union all
select 3, 3, 'Doug' union all
select 4, 4, 'Sam' union all
select 7, 7, 'Terry' union all
select 8, 8, 'Sal'

In this example, Mandy is the top level of the hierarchy. Bob and Sue report directly to Mandy, since their numbers are inside her range (anything between 1 and 10 is in her hierarchy). Doug and Sam report to Bob (their numbers are between 2 and 5, which are Bob's numbers), while Sam, Terry and Sal report to Sue (between 6 and 9).

This is queried as:

create proc HierarchySelect
(@TopNode_in int)
as
declare @SEnd int

select @SEnd = SetEnd
from dbo.NSHierarchy
where SetStart = @TopNode_in

select Name
from dbo.NSHierarchy
where SetStart between @TopNode_in and @SEnd
and SetEnd between @TopNode_in and @SEnd

(Variations on this to deal with the name as an input parameter, or to allow a level to have two parents (SetStart in one range, SetEnd in another), etc., are pretty easy to extrapolate.)

Joe Celko has a much more complete and comprehensive description of this on his site and in his books. I won't try to go into that level of detail, because he's already done a better job of it than I will.

The advantage to this is that it's MUCH faster than an adjacency hierarchy.

For example, that 7,000-node, 50-level hierarchy, takes almost a second to run (on my test machine) using a CTE, exactly as described in the article. Not bad, but it also takes 7,000 scans (IO). That's a LOT of IO. With a nested sets hierarchy, it takes less than a millisecond to run, and takes 2 scans.

The disadvantage of a nested sets hierarchy is that it takes more work to update/insert than an adjacency hierarchy. If the data changes a lot, it becomes much more difficult to manage.

I've solved that for my situation by using a hybrid solution (partially suggested by Jeff Moden). I have tables that looks like this:

create table Hierarchies (
HierarchyID int identity primary key,
Name varchar(100))
go
create table HierarchiesNodes (
HierarchyID int not null references dbo.Hierarchies(HierarchyID),
NodeID int identity primary key,
ParentID int null references dbo.Hierarchy(NodeID),
SetStart int null,
SetEnd int null,
SetLevel int null,
CustomerID int not null references dbo.Customers(CustomerID),
ActiveDate datetime default(getdate()),
InactiveDate datetime null)

The update, insert and delete code for the nodes table sets the SetStart and SetEnd columns to null for any affected rows. The lookup code (select), checks to see if any level that's in the same hierarchy (HierarchyID) has a null SetStart or SetEnd. If so, it uses a CTE to pull the hierarchy. If not, it uses a nested sets query to pull it. Every hour, the table is scanned for null SetStart and SetEnd values, and if any are found, those hierarchies have their set data updated. Again, Joe Celko has an article on how to do that efficiently (it's not hard to figure out if you know how to use a CTE to resolve the hierarchy in the first place).

I use the HierarchyID column as the first column in the indexes on this table. It's selective enough, in my case, that it seriously speeds up the selects. It doesn't matter for the set queries, but it does for the CTEs.

I use the SetLevel column for the nested sets queries, because unlike a recursive CTE, I can't just have it use a "Depth + 1" calculation if I want levels. It also allows me to limit the number of levels, if that's appropriate for the query at hand. Just like the HierarchyID doesn't matter to the sets, the SetLevel column doesn't matter for the CTE, but I need both for this hybrid solution.

Thus far, this gives us the best of both models.

I thought it should be mentioned, since this data matters if you deal with hierarchies much.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #504466
Posted Wednesday, May 21, 2008 7:50 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Thursday, December 18, 2014 9:58 AM
Points: 13,872, Visits: 9,600
ddnrg (5/21/2008)
Does the new SQL2008 HeirarchyId pretty much replace the need for doing this?


That one, per what I've read, is fast to query, but slow to update/insert/delete. Adjacency hierarchies (like the one described in the article) are fast to update/insert/delete and somewhat slower to query. So it might, or might not, replace it, depending on specific needs.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #504473
Posted Wednesday, May 21, 2008 8:07 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:11 PM
Points: 35,772, Visits: 32,441
Nice and simple, pleasure to read. Well done, John! :)

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #504492
Posted Wednesday, May 21, 2008 8:43 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Monday, December 15, 2014 4:15 AM
Points: 5,471, Visits: 1,402
Very well written..........


Post #504525
Posted Thursday, May 22, 2008 11:47 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, May 11, 2011 11:12 AM
Points: 10, Visits: 63
Got error "Invalid object name 'salaries'."
Table is not created anywhere
Post #505367
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse