I've used hierarchies with as many as 7,000 nodes and as many as 50 levels. CTEs work well for those.
But one thing that should be brought up in any discussion of hierarchies is the "nested sets" model.
The type of hierarchy defined in this article is called an "adjacency hierarchy". A nested sets hierarchy works a little different.
Here's a sample table for a nested sets hierarchy:
create table NSHierarchy (
SetStart int not null,
SetEnd int not null,
constraint PK_NSHierarchy primary key (SetStart, SetEnd),
The way it works is that the SetStart column and the SetEnd column define a range of values, and lower levels of the hierarchy are given values between those.
insert into dbo.NSHierarchy (SetStart, SetEnd, Name)
select 1, 10, 'Mandy' union all
select 2, 5, 'Bob' union all
select 6, 9, 'Sue' union all
select 3, 3, 'Doug' union all
select 4, 4, 'Sam' union all
select 7, 7, 'Terry' union all
select 8, 8, 'Sal'
In this example, Mandy is the top level of the hierarchy. Bob and Sue report directly to Mandy, since their numbers are inside her range (anything between 1 and 10 is in her hierarchy). Doug and Sam report to Bob (their numbers are between 2 and 5, which are Bob's numbers), while Sam, Terry and Sal report to Sue (between 6 and 9).
This is queried as:
create proc HierarchySelect
declare @SEnd int
select @SEnd = SetEnd
where SetStart = @TopNode_in
where SetStart between @TopNode_in and @SEnd
and SetEnd between @TopNode_in and @SEnd
(Variations on this to deal with the name as an input parameter, or to allow a level to have two parents (SetStart in one range, SetEnd in another), etc., are pretty easy to extrapolate.)
Joe Celko has a much more complete and comprehensive description of this on his site and in his books. I won't try to go into that level of detail, because he's already done a better job of it than I will.
The advantage to this is that it's MUCH faster than an adjacency hierarchy.
For example, that 7,000-node, 50-level hierarchy, takes almost a second to run (on my test machine) using a CTE, exactly as described in the article. Not bad, but it also takes 7,000 scans (IO). That's a LOT of IO. With a nested sets hierarchy, it takes less than a millisecond to run, and takes 2 scans.
The disadvantage of a nested sets hierarchy is that it takes more work to update/insert than an adjacency hierarchy. If the data changes a lot, it becomes much more difficult to manage.
I've solved that for my situation by using a hybrid solution (partially suggested by Jeff Moden). I have tables that looks like this:
create table Hierarchies (
HierarchyID int identity primary key,
create table HierarchiesNodes (
HierarchyID int not null references dbo.Hierarchies(HierarchyID),
NodeID int identity primary key,
ParentID int null references dbo.Hierarchy(NodeID),
SetStart int null,
SetEnd int null,
SetLevel int null,
CustomerID int not null references dbo.Customers(CustomerID),
ActiveDate datetime default(getdate()),
InactiveDate datetime null)
The update, insert and delete code for the nodes table sets the SetStart and SetEnd columns to null for any affected rows. The lookup code (select), checks to see if any level that's in the same hierarchy (HierarchyID) has a null SetStart or SetEnd. If so, it uses a CTE to pull the hierarchy. If not, it uses a nested sets query to pull it. Every hour, the table is scanned for null SetStart and SetEnd values, and if any are found, those hierarchies have their set data updated. Again, Joe Celko has an article on how to do that efficiently (it's not hard to figure out if you know how to use a CTE to resolve the hierarchy in the first place).
I use the HierarchyID column as the first column in the indexes on this table. It's selective enough, in my case, that it seriously speeds up the selects. It doesn't matter for the set queries, but it does for the CTEs.
I use the SetLevel column for the nested sets queries, because unlike a recursive CTE, I can't just have it use a "Depth + 1" calculation if I want levels. It also allows me to limit the number of levels, if that's appropriate for the query at hand. Just like the HierarchyID doesn't matter to the sets, the SetLevel column doesn't matter for the CTE, but I need both for this hybrid solution.
Thus far, this gives us the best of both models.
I thought it should be mentioned, since this data matters if you deal with hierarchies much.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon