Thank this author by sharing:
By John Cawley III,
Two Useful Hierarchy Functions
Common Table Expressions (CTEs) are wonderful things when you need to deal with hierarchical data, because they are able to recurse conveniently. There are, however, two shortcomings to their direct use for this.
First, they can look awfully complex at the top of an otherwise simple query, making what should be straightforward code harder to read:
What you really want to do is encapsulate that recursive logic away so that you only have to look at it when you are working directly on it, and just use it otherwise.
Second, when working with very large or deep hierarchies, you'd like to restrict yourself to just the relevant portion of the hierarchy for performance reasons. Why recalculate the entire hierarchical structure every time on the fly when only a small sub-hierarchy is needed?
It turns out that a simple user-defined function wrapping a CTE can rectify both these shortcomings.
Our Working Example
The code below creates and populates a simple, fairly generic Hierarchy table with this structure:
The year column allows the hierarchy to vary by year (and I wanted to show how you'd work that extra bit of complexity into the CTE below). The node_id column identifies the entry in the node (whether it's a person, a department, a book chapter, a huge component-assembly tree, etc). The node_name is a description of the node/entry. And the parent_node recursively points back into this table with the node_id of the parent (the supervisor, or the book section, or the assembly, etc).
Here's the code to set up the working example:
----- create example tableif object_id( 'hierarchy', 'U' ) is not null drop table hierarchyGOcreate table hierarchy( year varchar(4) not null, node_id int not null, node_name varchar(35), parent_node int)GOalter table hierarchy add constraint hierarchy_pk primary key clustered( year, node_id );GOif exists( select * from sys.indexes where name = 'hierarchy_ix' ) drop index hierarchy_ix on hierarchyGOcreate nonclustered index hierarchy_ix on hierarchy( year, parent_node )GO
----- populate tableinsert into hierarchy values( '2007', 1, 'Alan', null )insert into hierarchy values( '2007', 2, 'Barbara', 1 )insert into hierarchy values( '2007', 3, 'Charles', 1 )insert into hierarchy values( '2007', 4, 'David', 2 )insert into hierarchy values( '2007', 5, 'Eileen', 2 )insert into hierarchy values( '2007', 6, 'Frances', 1 )insert into hierarchy values( '2007', 7, 'George', 4 )insert into hierarchy values( '2007', 8, 'Howard', 7 )insert into hierarchy values( '2007', 9, 'Ivan', 7 )insert into hierarchy values( '2007', 10, 'Juliet', 1 )
insert into hierarchy values( '2008', 1, 'Alan', 2 )insert into hierarchy values( '2008', 2, 'Barbara', null )insert into hierarchy values( '2008', 3, 'Charles', 2 )insert into hierarchy values( '2008', 4, 'David', 1 )insert into hierarchy values( '2008', 5, 'Eileen', 1 )insert into hierarchy values( '2008', 6, 'Frances', 2 )insert into hierarchy values( '2008', 7, 'George', 4 )insert into hierarchy values( '2008', 8, 'Howard', 7 )insert into hierarchy values( '2008', 9, 'Ivan', 7 )insert into hierarchy values( '2008', 10, 'Juliet', 2 )GO
The Subordinates Function
The subordinates function is a simple way to find all the subordinate nodes of a given node for a given year:
This pulls all hierarchy information for 2007 underneath node_id 1 (Alan). The distance column indicates how far down the hierarchy each subordinate node is from Alan - distance 1 nodes work directly for Alan, distance 2 nodes work for Alan's direct subordinates, and so on. The year and node_id identify the node in the Hierarchy table. And the node_seq column indicates the parent-child sequence from the given node (Alan) to that node in a dotted notation. For instance, Howard has a node_seq of "220.127.116.11.8" below - this indicates that Alan (node_id 1) is at the top of this hierarchy, then Barbara (node_id 2), then David (node_id 4), then George (node_id 7), and finally lowly Howard (node_id 8).
The same function can be called to produce information on just a sub-hierarchy:
In this case, we will just see just the direct and indirect subordinates to Barbara (node_id 2, as specified in the function call).
Here's the code to create the subordinates function:
----- create Subordinates function using CTEif object_id( 'subordinates', 'IF' ) is not null drop function subordinatesGOcreate function subordinates( @year int, @node_id int ) returns table as returnwith subnodes( distance, year, node_id, node_name, node_seq )as( select 0, @year, h.node_id, h.node_name, convert( varchar(80), ltrim(str(h.node_id))) as node_seq from hierarchy h where h.node_id = @node_id and h.year = @year union all select distance+1, @year, h.node_id, h.node_name, convert( varchar(80), sn.node_seq+'.'+ltrim(str(h.node_id))) from hierarchy h inner join subnodes sn on h.year = @year and h.parent_node = sn.node_id)select distance, year, node_id, node_name, node_seq from subnodesGO
The Superiors Function
The superiors function is a simple way to traverse in the opposite direction, back up the hierarchy from a given node during a given year:
This pulls each superior node for 2007 for Ivan (node_id 9). The distance column indicates how far up the hierarchy the node is from Ivan (and it is negative as an additional reminder that we are moving up the hierarchy) - distance -1 is Ivan's immediate supervisor, distance -2 is that supervisor's supervisor, etc. The year and node_id identify the node in the Hierarchy table.
Here's the code to create the superiors function:
----- create the Superiors function using CTEif object_id( 'superiors', 'IF' ) is not null drop function superiorsGOcreate function superiors( @year int, @node_id int ) returns table as returnwith supnodes( distance, year, node_id, node_name, parent_node )as( select 0, @year, h.node_id, h.node_name, h.parent_node from hierarchy h where h.node_id = @node_id and h.year = @year union all select distance-1, @year, h.node_id, h.node_name, h.parent_node from hierarchy h inner join supnodes sn on h.year = @year and h.node_id = sn.parent_node)select distance, year, node_id, node_name from supnodesGO
Being able to collect all the subordinates of a node from a hierarchy quickly is useful for many purposes. Rolling up summary numbers to a parent node is one such case:
select sum( s.salary ) departmental_salary_total, sum( s.fica ) as departmental_fica_totalfrom subordinates( 2007, 2 ) base left join salaries s on s.emp_id = node_id
This creates Barbara's (node_id 2) sub-hierarchy on the fly, grabs the associated salaries and FICA costs and totals them for her sub-organization.
Authorization rules can also be enforced using these on-the-fly sub-hierarchies:
select *from budget_edits base inner join subordinates( 2007, @login_node_id ) auth on auth.node_id = base.department_id
This pulls budget edits, filtering them down to just those that the user has visibility to by inner joining with that user's sub-hierarchy.
There are many other potential uses for these functions.
Where hierarchical information is stored in a table with recursive "parent" links, it is often convenient to pull all direct and indirect subordinates of a node. CTEs permit this, but may be made more readable and more efficient by wrapping them in a T-SQL user function.
CTEs - with proper indexing (note in particular the index on the year and parent_id in the working example code above!) - perform quite efficiently. I have worked with hierarchical data of up to 12 hierarchical levels among some 24,000 nodes, and the functions above execute in less than a second.
By default, CTEs limit recursion to 100 levels. If you need more than that many levels in your hierarchy, that default can be adjusted with a MAXRECURSION hint (see article at http://msdn2.microsoft.com/en-us/library/ms181714.aspx).
Hopefully, this article has shared two functions that you will find useful in your toolbox for dealing with hierarchical information!
Comments posted to this topic are about the item [B]Two Useful Hierarchy Functions[/B] very nicely d...
Bill Pearson continues his examination of the versatile Order() function, focusing upon its use in p...
Convert Horizontal Hierarchy
to vertical Hierarchy.
One very common structure that needs to be handled in T-SQL is the hierarchy. One of our prominent m...
Learn how you can query a hierarchy of data and also return the results in an ordered fashion. A han...