The new HierarchyID datatype in SQL Server 2008 has captured my interest lately. I’ve been working on a presentation that I can give to some local user groups, and hope to have it done soon. As with other data types, an index can be built on this column to help speed up the performance of queries.
An interesting thing that I learned while working with this data type is that there are two different ways of indexing the data in this column. You can do a depth first index, or a breadth first index.
A hierarchy is represented as a tree of values. An example of this might be the following:
If you examine the hierarchy, you can see that the root is node A, it has a children as nodes B and C, and there are further children. The diagram makes it easy to see which nodes are above or below other nodes in the hierarchy. The depth first indexing assumes that you are querying for subtrees and so it groups those nodes together in the index. In this case the index would store the nodes as:
A, B, E, F, C, D, G
In other words, this would be a traversal of the hierarchy that looks like this, essentially a left-child first traversal.
This is useful if you often query for subtree and its child nodes.
In this case, all nodes at a particular level are stored together in the index. For our example above, the traversal would look like:
And that would result in the nodes being stored like this:
A, B, C, D, E, F, G
The first level is the root, node A. The next level contains the children of the first level, which are B, C, D. Finally the bottom level of E, F, and G is indexed. This is useful when you are querying for all nodes at some particular level.
I’ll show how to create the different indexes and some actual SQL code in another post, however one thing to note here is that there is no guarantee of uniqueness with the HierarchyID. Similar to an identity property, if you want to be sure things are unique, you need a unique index.