Hierarchies in SQL, Part II, the Sequel

Author's Note: This is a follow-up from my original Hierarchies in SQL article, posted on SQL Server Central on 28 Jan 2009. I originally wrote it a while back, and then found that it needed a lot of editing and a lot more testing, so it doesn’t include any SQL 2012-specific data, but it’s still valid and applies equally to SQL 2008, 2008R2, and 2012.)

I've recently had hierarchies in SQL Server brought to my attention again. Questions also keep coming up on how the new HierarchyID datatype compares to both adjacency hierarchies and nested sets hierarchies. I've dug into this datatype a few times in the past, but not with any degree of detail or serious attention, so I decided I needed to remedy that.

If you like surprise twists at the end of stories, this article has definitely got one. If you prefer to skip to the end, you can do that and avoid all the suspense. I tend towards verbosity in the same way that hurricanes “tend towards” damp and breezy, and you can skip most of that if you scroll down to the summary. But you’ll miss out on all the test scripts. And the high-budget special effects (well, you would if there were any).

In my prior tests, I'd found that adjacency can be very, very fast for updates and inserts, but is much slower at resolving hierarchies past a certain depth or complexity. I've found that nested sets are very, very fast at resolving hierarchies regardless of depth or complexity, but are very slow for updates and inserts except on VERY simple hierarchies. And that the HierarchyID method is faster to query than adjacency, but slower to update, and slower to query than nested sets, but faster to update, and thus seemed to be a sort of "not really the best at anything but an okay compromise".

Based on that, I actually worked out a "hybrid hierarchy", using both adjacency and nested sets, that was fast to update and fast to query, but kind of complex to manage overall. At the time, not having SQL Server 2008 in a production environment, that was a better compromise for the real-world situation I had to solve at that time. (See the prior article, linked above, for a lot more details on that compromise.)

First, to set up a test harness for comparing all three methods, run this code:

IF DB_ID(N'ProofOfConcept') IS NULL
      BEGIN
            CREATE DATABASE [ProofOfConcept];
            ALTER DATABASE [ProofOfConcept] SET COMPATIBILITY_LEVEL = 100;
      END;
GO
USE [ProofOfConcept]
GO
CREATE TABLE [dbo].[HierarchyTest](
      [ID] [int] IDENTITY(1,1) PRIMARY KEY,
      [NodeName] [varchar](10) NOT NULL,
      [ParentID] [int] NULL,
      [RangeStart] [int] NULL,
      [RangeEnd] [int] NULL,
      [HID] [hierarchyid] NULL,
      [HID_Level]  AS ([HID].[GetLevel]()))
GO
CREATE NONCLUSTERED INDEX [IDX_Adjacency] ON [dbo].[HierarchyTest]
(
      [ParentID] ASC
)
INCLUDE ( [NodeName]);
GO
CREATE NONCLUSTERED INDEX [IDX_HID] ON [dbo].[HierarchyTest]
(
      [HID] ASC
)
INCLUDE ( [NodeName]);
GO
CREATE NONCLUSTERED INDEX [IDX_Nested] ON [dbo].[HierarchyTest]
(
      [RangeStart] ASC,
      [RangeEnd] ASC
)
INCLUDE ( [NodeName]);
GO
ALTER TABLE [dbo].[HierarchyTest]  WITH CHECK 
 ADD FOREIGN KEY([ParentID])
  REFERENCES [dbo].[HierarchyTest] ([ID]);
GO
ALTER TABLE [dbo].[HierarchyTest]  WITH CHECK 
 ADD  CONSTRAINT [CK_Range]
  CHECK  (([RangeStart] IS NULL OR [RangeEnd] IS NULL OR [RangeEnd]>=[RangeStart]));
GO
ALTER TABLE [dbo].[HierarchyTest] CHECK CONSTRAINT [CK_Range];
GO
CREATE TABLE dbo.Numbers (
      Number INT PRIMARY KEY);
GO
INSERT INTO dbo.Numbers (Number)
 SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY T1.object_id)
 FROM sys.all_columns AS T1
  CROSS JOIN sys.all_columns AS T2;

I like to do testing in a database I call ProofOfConcept, usually on a desktop machine that's running a copy of SQL Server Developer Edition. The particular desktop computer I'm running all of these tests on this time, has a Core i7 processor (quad-core, two threads per core = effectively 8 CPUs), 16 Gigs of RAM, and the hard drives are on an eSATA bus (3.5 Terabytes of storage space total on the machine). (Yes, that's my desktop computer, not a server.)

Now to set up some test data in the table. I start out with just a few rows of data, just to make sure this is all working correctly:

TRUNCATE TABLE dbo.HierarchyTest;
INSERT INTO dbo.HierarchyTest
        (NodeName,
         ParentID,
         RangeStart,
         RangeEnd,
         HID
        )
VALUES  ('Joe', -- NodeName - varchar(10)
         null, -- ParentID - int
         1, -- RangeStart - int
         6, -- RangeEnd - int
         hierarchyid::GetRoot()  -- HID - hierarchyid
        );
INSERT INTO dbo.HierarchyTest
        (NodeName,
         ParentID,
         RangeStart,
         RangeEnd,
         HID
        )
VALUES  ('Bob', -- NodeName - varchar(10)
         1, -- ParentID - int
         2, -- RangeStart - int
         3, -- RangeEnd - int
         CAST('/1/' AS HIERARCHYID)  -- HID - hierarchyid
        ),
        ('Doug', -- NodeName - varchar(10)
         1, -- ParentID - int
         4, -- RangeStart - int
         5, -- RangeEnd - int
         CAST('/2/' AS HIERARCHYID)  -- HID - hierarchyid
        );
SELECT *, HID.ToString() AS HierarchyString
 FROM dbo.HierarchyTest
 ORDER BY HID;
SELECT *, HID.ToString() AS HierarchyString
 FROM dbo.HierarchyTest
 ORDER BY HID_Level, HID;

This shows that the table is working, and tests some very simple queries against it, just to make sure I am getting expected results from known data. Every later test builds on this one. Since this is the first test, the initial TRUNCATE command isn't really necessary, but I keep it in the script in case I want to come back to this and need to clean out the table.

I next built a more realistic hierarchy. Still small, with only ten-thousand rows of data.

TRUNCATE TABLE dbo.HierarchyTest;
GO
INSERT INTO dbo.HierarchyTest
        (NodeName,
         ParentID,
         RangeStart,
         RangeEnd,
         HID
        )
 SELECT 
    'Person' + RIGHT('0000' + CAST(Number AS VARCHAR(4)), 4),
    NULL, NULL, NULL, NULL
  FROM dbo.Numbers;
GO
UPDATE dbo.HierarchyTest
 SET ParentID = ID / 10 + ID % 10
 WHERE ID > 10;
GO
;
WITH    Hierarchy(ID, PID, Lvl, Pth)
          AS (SELECT    ID,
                        NULL,
                        0,
                        '/' + CAST(ID AS VARCHAR(MAX)) + '/'
              FROM      dbo.HierarchyTest
              WHERE     ParentID IS NULL
              UNION ALL
              SELECT    HSub.ID,
                        HSub.ParentID,
                        Hierarchy.Lvl + 1,
                        Hierarchy.Pth + CAST(HSub.ID AS VARCHAR(MAX)) + '/'
              FROM      dbo.HierarchyTest AS HSub
                        INNER JOIN Hierarchy
                            ON HSub.ParentID = Hierarchy.ID),
        Bottom
          AS (SELECT    ID,
                        Pth,
                        ROW_NUMBER() OVER (ORDER BY Pth) * 10 AS RS
              FROM      Hierarchy
              WHERE     ID NOT IN (SELECT   ParentID
                                   FROM     dbo.HierarchyTest
                                   WHERE    ParentID IS NOT NULL))
    MERGE dbo.HierarchyTest AS H
        USING
            (SELECT Hierarchy.ID AS RangeID,
                    CAST(Hierarchy.Pth AS HIERARCHYID) AS HID,
                    MIN(RS) AS RS,
                    MAX(RS) AS RE
             FROM   Bottom
                    INNER JOIN Hierarchy
                        ON Bottom.Pth LIKE Hierarchy.Pth + '%'
             GROUP BY Hierarchy.ID, Hierarchy.Pth) AS Ranges
        ON H.ID = Ranges.RangeID
        WHEN MATCHED
            THEN UPDATE
               SET      H.RangeStart = Ranges.RS,
                        H.RangeEnd = Ranges.RE + 9,
                        H.HID = Ranges.HID ;
GO
;
WITH    CTE
          AS (SELECT    ID,
                        ROW_NUMBER() OVER (ORDER BY HT1.HID) AS R
              FROM      dbo.HierarchyTest AS HT1
              WHERE     RangeStart IS NULL
                        AND RangeEnd IS NULL
                        AND NOT EXISTS ( SELECT *
                                         FROM   dbo.HierarchyTest AS HT2
                                         WHERE  HT2.ParentID = HT1.ID ))
    UPDATE  Tgt
    SET     RangeStart = R,
            RangeEnd = R
    FROM    dbo.HierarchyTest AS Tgt
            INNER JOIN CTE
                ON CTE.ID = Tgt.ID ;
WHILE @@ROWCOUNT > 0
    BEGIN
        WITH CTE
          AS (SELECT    
         HT1.ID,
                 MIN(HT2.RangeStart) AS RS,
                 MAX(HT2.RangeEnd) AS RE
               FROM dbo.HierarchyTest AS HT1
                INNER JOIN dbo.HierarchyTest AS HT2
                  ON HT1.ID = HT2.ParentID
               WHERE HT1.RangeStart IS NULL
                 AND HT1.RangeEnd IS NULL
                 AND HT2.RangeStart IS NOT NULL
                 AND HT2.RangeEnd IS NOT NULL
               GROUP BY  HT1.ID)
      UPDATE  Tgt
       SET    RangeStart = CTE.RS,
              RangeEnd = CTE.RE
      FROM    dbo.HierarchyTest AS Tgt
       INNER JOIN CTE
        ON Tgt.ID = CTE.ID ;
    END ;

First, this script clears the table. Not necessary for a first run, but this script is designed to be run multiple times in case I want a clean dataset on subsequent tests. Next, the script inserts raw rows, using a string-builder instead of, for example, real names of people, departments, or components. Fake data like this will suffice for this kind of testing.

The next piece arbitrarily divides the hierarchy data into chunks and assigns parent IDs to all but 10 of the rows. Not completely realistic, but it'll do for this test. Note that the method I used for building the HierarchyID data is pretty much exactly as outlined in Books Online (and on MSDN) by Microsoft. (http://msdn.microsoft.com/en-us/library/bb677290.aspx)

The method for building the Nested Sets data is significantly different from the way outlined in my original article, and very different from the method I first learned from Joe Celko's work. The article of his that I linked to my earlier article on this is a broken link, but if you search online for "nested sets hierarchy" and "Joe Celko", you can find the relevant data. It works, and that’s what really matters.

There are some people who insist on building the Nested Sets data in a single query. I don’t know why, since this does it accurately and in about 1 second on my machine. Surely that’s fast enough and efficient enough.

Now to query my hierarchy:

IF OBJECT_ID(N'tempdb..#T1') IS NOT NULL
      DROP TABLE #T1;
     
IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL
      DROP TABLE #T2;
IF OBJECT_ID(N'tempdb..#T3') IS NOT NULL
      DROP TABLE #T3;
     
SET NOCOUNT ON ;
SET STATISTICS TIME, IO ON;
PRINT 'HierarchyID Select' ;
DECLARE @MgrID HIERARCHYID = (SELECT    HID
                              FROM      dbo.HierarchyTest
                              WHERE     ID = 2) ;
SELECT  HID, NodeName
 INTO #T1
 FROM    dbo.HierarchyTest
 WHERE   HID.IsDescendantOf(@MgrID) = 1 ;
PRINT 'Nested Sets Select' ;
DECLARE @RS INT,
    @RE INT ;
SELECT  @RS = RangeStart,
        @RE = RangeEnd
 FROM   dbo.HierarchyTest
 WHERE  ID = 2 ;
SELECT  RangeStart, RangeEnd, NodeName
 INTO #T2
 FROM    dbo.HierarchyTest
 WHERE   RangeStart BETWEEN @RS AND @RE ;
PRINT 'Adjacency Select' ;
;
WITH    Hierarchy(ID, PID)
          AS (SELECT    2,
                        NULL
              UNION ALL
              SELECT    H.ID,
                        H.ParentID
              FROM      dbo.HierarchyTest AS H
                        INNER JOIN Hierarchy
                            ON H.ParentID = Hierarchy.ID)
    SELECT  HT.ID, HT.ParentID, HT.NodeName
    INTO #T3
    FROM    dbo.HierarchyTest AS HT
            INNER JOIN Hierarchy
                ON HT.ID = Hierarchy.ID ;
SET STATISTICS TIME, IO OFF;
/*
HierarchyID Select
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 2 ms.
  
Nested Sets Select
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 2 ms.
 
Adjacency Select
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 1112, logical reads 4461, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 6673, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 18 ms.
*/

The commented portions at the bottom are the results I got. By inserting into on-the-fly temp tables, I eliminated performance variables that come from actually returning a dataset to the screen. That makes it a cleaner test, in my experience.

The first thing to note is that the HierarchyID and the Nested Sets method both got nearly identical results. This didn't match with my prior experiences, so I really dug into it. Either my prior tests were flawed (likely), or Microsoft has improved the HierarchyID query speed significantly since I first tested it. Either way, it proved out on multiple tests in my environment, so I have to go with the results I got. That definitely surprised me, but results are results.

The second thing to note is that, as expected, the Adjacency query took much longer, and was much more I/O intensive. It had to actually scan the table over 1,000 times just to build a worktable, and had to perform over 10,000 logical reads between the two tables. That's a lot of I/O for a seemingly simple query. That test exactly matches prior tests I've done, so I just ran it a few times to confirm the results.

Now comes another interesting test: updating data. Let's say I need to move ID 12 to ParentID 30. Someone's department is getting reorganized, and now they answer to a new manager, while keeping all of their existing people under them. Simple, right?

SET NOCOUNT ON ;
DECLARE @ID INT = 12,
    @NewParentID INT = 30 ;
PRINT 'Adjacency' ;
BEGIN TRANSACTION ;
SET STATISTICS IO, TIME ON;
UPDATE  dbo.HierarchyTest
 SET    ParentID = @NewParentID
 WHERE  ID = @ID ;
SET STATISTICS IO, TIME OFF;
ROLLBACK ; -- Needs to roll back so each update can work on the same rows
PRINT 'HID' ;
BEGIN TRANSACTION ;
SET STATISTICS IO, TIME ON;
DECLARE @HID HIERARCHYID,
    @HID2 VARCHAR(MAX),
    @HID3 VARCHAR(MAX),
    @IDString VARCHAR(10) ;
SELECT  @HID = HID,
        @HID2 = HID.ToString(),
        @IDString = ID
FROM    dbo.HierarchyTest
WHERE   ID = @ID ;
SELECT  @HID3 = HID.ToString() + @IDString + '/'
FROM    dbo.HierarchyTest
WHERE   ID = @NewParentID ;
UPDATE  dbo.HierarchyTest
SET     HID = CAST(REPLACE(HID.ToString(), @HID2, @HID3) AS HIERARCHYID)
WHERE   HID.IsDescendantOf(@HID) = 1 ;
SET STATISTICS IO, TIME OFF;
ROLLBACK ;
/*
Adjacency Update
Table 'HierarchyTest'. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
  
HID Update
Table 'HierarchyTest'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 1, logical reads 6320, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 1796, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 46 ms.
*/

I spent an hour or so trying to figure out a simple way to do this same test with a Nested Sets model, and all I could come up with was doing the Adjacency update, and then re-running the Nested Sets build for the whole table. So it ended up taking approximately 850 milliseconds average on my machine. Almost 20 times as long as the HierarchyID update.

The Adjacency update is trivial to code, and trivial to execute. It's FAST. It updates a single row, and that’s it. That's exactly as expected.

I did some research to find if there was a better method for editing the HierarchyID data in this, but the best I could come up with was use its ToString method and some simple string manipulation, and then Cast/Convert it back to HierarchyID. That works, and isn't exactly glacially slow, but it's massively slower and much more I/O intensive than the Adjacency model. Again, this exactly matches expectations and prior tests, so I just ran it a few times to confirm. It also matches exactly what Microsoft says in MSDN and Books Online.

I was, however, surprised that it was as fast as it was, once I hit on the method I ended up using. I was actually expecting to rebuild data from scratch, like Nested Sets. Instead, I found it easy and fast enough that I wouldn't worry about performance on it except on the most seriously overloaded servers. After all, 46 milliseconds isn’t exactly slow, it’s just slower than sub-millisecond for the Adjacency model. And about 20 times faster than Nested Sets.

Theoretically, it would be possible to cast the HierarchyID values to varbinary, and then use the Stuff() function to insert changes in the path, instead of casting to varchar. I tested this, and it seems to work, but it doesn’t save any complexity or performance, and isn’t as easy to read. Here’s an example:

DECLARE @HID HIERARCHYID = 0xE0247E00000201A8,
    @NewParent HIERARCHYID = 0xE024C0,
    @NewHID HIERARCHYID ;
SET @NewHID = (SELECT   CAST(CAST(STUFF(CAST(HID AS VARBINARY(MAX)), 
                        DATALENGTH(HID.GetAncestor(1)), 0,
               CAST(@NewParent AS VARBINARY(MAX))) AS VARBINARY(MAX)) AS HIERARCHYID)
               FROM     dbo.HierarchyTest
               WHERE    HID = @HID) ;

It uses a value from my test table, and inserts a new parent in between the existing parent and the current node. Would have to be done for every row affected. DataLength on the immediate ancestor gives the location in the binary string to stuff the new value, and nested conversions (cast) to and from varbinary and HierarchyID are needed.

That method isn’t documented in TechNet or MSDN or Books Online, so use at your own risk. Worth researching it, though, just for proof-of-concept.

One thing I did find in testing the HierarchyID datatype, that I wasn't expecting, but should have, is that the HierarchyID datatype doesn't deal well with multiple hierarchy roots. In Adjacency and Nested Sets hierarchies, you can very easily make a single table hold multiple top nodes. It’s possible, but not as easy, with the HierarchyID type.

To see this, re-run the first data load (three rows), and then run this:

SELECT  *
FROM    dbo.HierarchyTest
WHERE   HID = HIERARCHYID::GetRoot() ;
INSERT INTO dbo.HierarchyTest
        (NodeName,
         ParentID,
         RangeStart,
         RangeEnd,
         HID
        )
VALUES  ('Person0000', -- NodeName - varchar(10)
         null, -- ParentID - int
         null, -- RangeStart - int
         null, -- RangeEnd - int
         HIERARCHYID::GetRoot()  -- HID - hierarchyid
        );

SELECT  *, HID.GetDescendant(NULL, NULL)
FROM    dbo.HierarchyTest
WHERE   HID = HIERARCHYID::GetRoot() ;

You'll find that the second top node, Person000, will have the descendants of the first top node. This can be worked around by not using the GetRoot method to make top nodes, and instead starting with the ID of each top node's row, as per my other examples.

Now, back to performance testing. This time, with a million rows and greater complexity to the hierarchy structure. 100 root notes, multiple levels deep, variable depth from different roots.

TRUNCATE TABLE dbo.HierarchyTest;
GO
INSERT INTO dbo.HierarchyTest
        (NodeName,
         ParentID,
         RangeStart,
         RangeEnd,
         HID
        )
SELECT TOP 1000000 
   'Person' + RIGHT('000000' + CAST(ROW_NUMBER() OVER (ORDER BY N1.Number) AS VARCHAR(6)), 6),
   NULL, NULL, NULL, NULL
FROM dbo.Numbers AS N1
CROSS JOIN dbo.Numbers AS N2;
GO
UPDATE dbo.HierarchyTest
 SET ParentID = ID / 100 + ID % 100
 WHERE ID > 100;
GO
;

I ran the same scripts for the HierarchyID build and the Nested Sets build as with the 10k-row version. There was no need to change the scripts just because the table has 100 times more rows. The execution took about a minute for each. It could probably be tuned to be faster but that’s good enough on a million rows in this scenario.

The Nested Sets method that I'd been using for years, the one from my first article, was absolutely DEADLY when run on a hierarchy this size. It ran for over 5 hours and then I gave up on it, so there’s no telling how much longer it would have run! Definitely use either the method here, or something you find online that’s even faster, and don’t use my old method.

Tests on the selects came to the same result, but scaled, as with the smaller sets.

/*
HierarchyID Select
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 6 ms.
Nested Sets Select
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Adjacency Select

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 10101, logical reads 61280, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 60607, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 129 ms.
*//*
Adjacency Update
Table 'HierarchyTest'. Scan count 0, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

HID Update
Table 'HierarchyTest'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'HierarchyTest'. Scan count 1, logical reads 94579, physical reads 0, read-ahead reads 49, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 358 ms, elapsed time = 454 ms.
*/

Summary of Million-Row Tests

Adjacency selects slow down to the point where glaciers and continental plates stuck behind them start tailgating and beeping at them to speed up, and the I/O stats are even more outrageous. Adjacency updates are still nearly instantaneous.

HierarchyID selects are still very, very fast, while updates are slowing down, but still not slow enough to matter (under half a second) except on an overloaded system.

Nested Sets selects are still the fastest, but only by 6 milliseconds in my tests, compared to HierarchyIDs. Updates on them are slow enough to make adjacency selects look fast.

Conclusion

This was by no means a be-all-and-end-all set of tests, but it was eye opening for me that HierarchyID outperformed Nested Sets so thoroughly! The HierarchyID was within the margin-of-error of being as fast on querying, and infinitely faster on both updates and complex generation.

I have to recommend the new HierarchyID if you have a database that needs fast queries and reasonably fast updates/inserts/deletes. Stay away from Nested Sets. If you need super-fast updates and can live with slower queries, go with Adjacency. That result was a bit of a shock to me, and thus I felt the need to share it, both to save anyone who might go down the wrong path because of my prior article, and to give others a chance to test it and possibly refute my findings.

Of course, none of the HierarchyID uses are applicable to anyone still using SQL 2005 or prior, so on those servers, I'd still recommend the hybrid method from my prior article, with the modified update in this one. But this might be another reason to add to the list for upgrading.

Follow-Up

I’ve also done some experimenting on a “Nested Sets” variation that ignores numbers completely and takes advantage of the fact that “AAAA” is between “AA” and “AB” in a string sort. I’ll have to write that one up once I get a chance to play with it more. It’s basically a string version of HierarchyID.

Hierarchies in SQL, Part II, the Sequel

Summary of Million-Row Tests

Conclusion

Follow-Up

Rate

Share

Categories

Share

Rate

Hierarchies in SQL, Part II, the Sequel

Summary of Million-Row Tests

Conclusion

Follow-Up

Rate

Share

Categories

Share

Rate

Related content

Selecting from hierarchies like Managers and Employees

Hierarchies in SQL

A Normalization Primer

Querying Responses

HOWTO: Run Singleton SELECT Queries in a Visual Basic Client