﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by GSquared  / Hierarchies in SQL, Part II, the Sequel / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Tue, 21 May 2013 18:30:47 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Thanks.</description><pubDate>Tue, 25 Dec 2012 09:01:42 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Good article.</description><pubDate>Tue, 25 Dec 2012 08:47:24 GMT</pubDate><dc:creator>Neha05</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>[quote][b]Jeff Moden (9/10/2012)[/b][hr][quote]There are some people who insist on building the Nested Sets data in a single query.  I don’t know why, since this does it accurately and in about 1 second on my machine.  Surely that’s fast enough and efficient enough.[/quote]Yowch.  I'm not sure what's going on but I have an I5 laptop w/6GB of RAM and I'm running SQL Server 2008 SP3 (version 10.0.5500.0).  It' takes 00:02:34 for the first merge of the 10,000 row example to run and the execution plan shows an internal arrow with almost a 90 million rowcount.  I thought it might be some form of parameter sniffinng and even restarted the instance but to no avail.  Any idea what might be going on there?[img]http://www.sqlservercentral.com/Forums/Attachment12360.aspx[/img][/quote]Yes, but only because I messed up and copy-and-pasted the obsolete script into the article!  The one that I said not to use any more a few paragraphs later.  I'm guess that might be a bit confusing ....Replace the second script with this:[code="sql"]WITH    Hierarchy(ID, PID, Lvl, Pth)          AS (SELECT    ID,                        NULL,                        0,                        '/' + CAST(ID AS VARCHAR(MAX)) + '/'              FROM      dbo.HierarchyTest              WHERE     ParentID IS NULL              UNION ALL              SELECT    HSub.ID,                        HSub.ParentID,                        Hierarchy.Lvl + 1,                        Hierarchy.Pth + CAST(HSub.ID AS VARCHAR(MAX)) + '/'              FROM      dbo.HierarchyTest AS HSub                        INNER JOIN Hierarchy                            ON HSub.ParentID = Hierarchy.ID)    MERGE dbo.HierarchyTest AS H        USING            (SELECT  ID,                    PID,                    Lvl,                    Pth FROM Hierarchy) AS Paths        ON H.ID = Paths.ID        WHEN MATCHED            THEN UPDATE               SET      H.HID = Paths.Pth ;[/code]Has nothing to do with the Nested Sets.  That's the third part of the script.  This part is only meant to generate the HierarchyID data.The Nested Sets bit is:[code="sql"]WITH    CTE          AS (SELECT    ID,                        ROW_NUMBER() OVER (ORDER BY HT1.HID) AS R              FROM      dbo.HierarchyTest AS HT1              WHERE     RangeStart IS NULL                        AND RangeEnd IS NULL                        AND NOT EXISTS ( SELECT *                                         FROM   dbo.HierarchyTest AS HT2                                         WHERE  HT2.ParentID = HT1.ID ))    UPDATE  Tgt    SET     RangeStart = R,            RangeEnd = R    FROM    dbo.HierarchyTest AS Tgt            INNER JOIN CTE                ON CTE.ID = Tgt.ID ;WHILE @@ROWCOUNT &amp;gt; 0    BEGIN        WITH CTE          AS (SELECT    		         HT1.ID,                 MIN(HT2.RangeStart) AS RS,                 MAX(HT2.RangeEnd) AS RE               FROM dbo.HierarchyTest AS HT1                INNER JOIN dbo.HierarchyTest AS HT2                  ON HT1.ID = HT2.ParentID               WHERE HT1.RangeStart IS NULL                 AND HT1.RangeEnd IS NULL                 AND HT2.RangeStart IS NOT NULL                 AND HT2.RangeEnd IS NOT NULL               GROUP BY  HT1.ID)      UPDATE  Tgt       SET    RangeStart = CTE.RS,              RangeEnd = CTE.RE      FROM    dbo.HierarchyTest AS Tgt       INNER JOIN CTE        ON Tgt.ID = CTE.ID ;    END ;[/code]That's correct in the article.  It's the HID script that's wrong.On a 10-thousand row hierarchy (as per that part of the article), generating the Nested Sets data takes less than 3 milliseconds on my desktop machine (Core i7 Quad, 16 Gig of RAM).  Can't tell how much less, since I tested it by setting a variable to GetDate at the beginning then select the DateDiff in milliseconds, and it came up 0.  That means less than 3 milliseconds, as per usual rules on accuracy of GetDate.Sorry for the incorrect script.  Not sure how I missed that mistake.</description><pubDate>Thu, 13 Sep 2012 13:00:00 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>[quote]There are some people who insist on building the Nested Sets data in a single query.  I don’t know why, since this does it accurately and in about 1 second on my machine.  Surely that’s fast enough and efficient enough.[/quote]Yowch.  I'm not sure what's going on but I have an I5 laptop w/6GB of RAM and I'm running SQL Server 2008 SP3 (version 10.0.5500.0).  It' takes 00:02:34 for the first merge of the 10,000 row example to run and the execution plan shows an internal arrow with almost a 90 million rowcount.  I thought it might be some form of parameter sniffinng and even restarted the instance but to no avail.  Any idea what might be going on there?[img]http://www.sqlservercentral.com/Forums/Attachment12360.aspx[/img]</description><pubDate>Mon, 10 Sep 2012 16:26:41 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>[quote][b]Robert.Sterbal (8/27/2012)[/b][hr]It seems like gene sequences often go in the thousands in lengths, and any change could be a mutation. 90% of all are genes are um... not used in encoding.The other part of this is that the genes (think of them as developer code) then spawn protein building (think of that as running code in production)[/quote]I have a basic understanding of DNA/RNA, expression, and protein synthesis.  Know the difference between myosis and mitosis (though I'm not actually sure I spelled either one correctly, I know what they are).  But that's not enough to have any advice on coding and database options with regard to them.I'd probably use a breadcrumb hierarchy of one sort or another, just based on the idea of what-spawns-what.  HierarchyID is essentially a breadcrumb, but encoded for use by some specific methods.  If its depth limitations will work for the kind of data you're talking about, then it should be fine.  But sequencing thousands of items, assumed to be AT/GC bonds, would mean a hierarchy thousands of levels deep, and the limits on the SQL Server HierarchyID datatype wouldn't allow that.  Doesn't pair-1 of human chromosomes have hundreds of millions of pairs, all by itself?But, again, I'm not even sure that mapping AT/GC sequences is what "sequencing" means.  I assume so, and that it would take 1 hierachy node per bond-pair, and those are assumptions I'm not qualified to make.</description><pubDate>Tue, 28 Aug 2012 06:53:08 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>It seems like gene sequences often go in the thousands in lengths, and any change could be a mutation. 90% of all are genes are um... not used in encoding.The other part of this is that the genes (think of them as developer code) then spawn protein building (think of that as running code in production)</description><pubDate>Mon, 27 Aug 2012 06:41:07 GMT</pubDate><dc:creator>Robert.Sterbal</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>[quote][b]Robert.Sterbal (8/27/2012)[/b][hr]Is anyone familiar with the application of HeirarchyID to genetic sequencing?[/quote]I have to admit I don't know enough about genetic sequencing to even be aware there's a relationship between the two things.  HierarchyID could easily be used for a family tree (to a certain depth), but is genetic sequencing a hierarchical data structure?  If so, it probably goes past the depth HierarchyID can readily do, but that's just a WAG on my part.</description><pubDate>Mon, 27 Aug 2012 06:37:43 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Is anyone familiar with the application of HeirarchyID to genetic sequencing?</description><pubDate>Mon, 27 Aug 2012 05:49:24 GMT</pubDate><dc:creator>Robert.Sterbal</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>To test maximum depth for HierarchyID, I just ran this test (same table definition as in the article):[code="sql"]SET NOCOUNT ON;GOTRUNCATE TABLE dbo.HierarchyTest;GOINSERT INTO dbo.HierarchyTest        (NodeName,         ParentID,         RangeStart,         RangeEnd,         HID        ) SELECT     'Person' + RIGHT('0000' + CAST(Number AS VARCHAR(4)), 4),    NULL, NULL, NULL, NULL  FROM dbo.Numbers; -- 10,001 rows of dataGODECLARE Cur CURSORFOR    SELECT  ID    FROM    dbo.HierarchyTest    ORDER BY ID FOR UPDATE;OPEN Cur;DECLARE @ID INT,    @PID INT,    @HID VARCHAR(MAX);FETCH NEXT FROM Cur INTO @ID;BEGIN TRY;    WHILE @@fetch_status = 0         BEGIN            SET @HID = COALESCE(@HID + CAST(@ID AS VARCHAR(MAX)) + '/', '/1/');	            UPDATE  dbo.HierarchyTest            SET     ParentID = @PID,                    HID = CAST(@HID AS HIERARCHYID)            WHERE CURRENT OF Cur;	            SET @PID = @ID;	            FETCH NEXT FROM Cur INTO @ID;        END;    CLOSE Cur;    DEALLOCATE Cur;END TRYBEGIN CATCH;    PRINT @ID;END CATCH;[/code]It'll fail when it reaches the greatest depth it can manage.  Got successfully to 427 in this test.If the node values were compressed, instead of ascending base-10 numbers, you could obviously squeeze more depth out of it, but I don't have time to try that right now.  Convert the base-10 to base-36, for example, and you'd get a lot more depth, since it's dependent on the length of the string that's being converted.  I haven't tried that yet, so it could just crash and burn.  Probably worth a test, though.</description><pubDate>Tue, 21 Aug 2012 12:28:11 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>[quote][b]HLogic (8/21/2012)[/b][hr]For hierarchies of &amp;lt; 50 levels depth, Nested Intervals (Hazel) may be of interest.[/quote]I consider those just a flavor of Nested Sets.  It has the one advantage that it doesn't require expanding the range at the top level when you insert at lower levels, while trading that off for complexity of implementation, difficulty in inserting nodes between other nodes at lower levels (requires recomputation of all sub-nodes that are shifted to the right; both version have this issue), difficulty for human data-verification (requires fairly complex math to visually validate the tree composition), and problems with rounding errors towards the bottom of the hierarchy if depth is greater than a certain level (depending on the data type used, this can happen shallower or deeper in the hierarchy).It retains the query speed of Nested Sets, since it's a single index scan (usually a range scan rather than a full scan) to resolve the most common hierarchy queries.  Per my tests (admittedly limited), the HierarchyID datatype has query speed that's in the same range, can handle deeper hierarchies (it can only go 892 bytes deep, but that's a lot of hex values), and is faster and easier to edit.For hierarchies too deep for HierarchyID, I'd probably go breadcrumb/path, if query speed matters more than storage size, or Nested Sets if the hierarchy is relatively static.</description><pubDate>Tue, 21 Aug 2012 12:08:53 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>For hierarchies of &amp;lt; 50 levels depth, Nested Intervals (Hazel) may be of interest.</description><pubDate>Tue, 21 Aug 2012 11:10:25 GMT</pubDate><dc:creator>HLogic</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>[quote][b]Shawn Dube (8/20/2012)[/b][hr]Hey Michael,I get that question a lot! :)It definitely is not a common need but one of the things my company supports are compensation plans for direct marketing and multilevel marketing companies. The specific plan I am referring to is called a Binary plan ( [url=http://en.wikipedia.org/wiki/Binary_plan]http://en.wikipedia.org/wiki/Binary_plan[/url] ).In these plans, it is common for one leg to grow much faster than the others and I did have one with 1042 nodes in a single leg. rest of tree was approx 400,000 nodesI imagine a Bill Of Materials for a space shuttle might get large too  ;) (though probably not 1000 parts deep)I am hoping to have a single solution that works nicely over our common needs (few thousand nodes mostly balanced) as well as these "exception" cases to allow our codebase to be common across clients. This isn't a hard-fast need, would just be nice to not have to switch out a client as they grow past possible thresholds.[/quote]In the MLM schemes I'm accustomed to reading about (I've not worked for one myself, so can't speak from direct experience), additions and queries are much more common than updates.In other words, Bob may recruit a bunch more salespeople under him, but the ones already under him aren't likely to suddenly need to be moved under Sam instead.  If Joe works under Bob, you're not likely to need to update Joe's position in the hierarchy, and that of everyone beneath him, or at least not frequently.Does that match your usual business need?If so, then Hierarchy ID is probably your best bet.  Easier to add to than Nested Sets, faster to query than Adjacency, by large margins in both cases.If you are using Adjacency, then recursion limits might do interesting things to you data/performance as well, but querying Hierarchy ID data isn't recursive.  So 1,000+ levels deep is still just a single index range-scan, instead of a scan for each level.</description><pubDate>Tue, 21 Aug 2012 06:55:06 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>[quote][b]Shawn Dube (8/20/2012)[/b][hr]Thanks for this article and the original! One question though... have you tried a very deep hierarchy on any of your methods? Like a million node binary tree (not necessarily balanced)? Something where one "leg" of the tree is a few thousand nodes deep?Just wondering what limitations may creep up. Greatly appreciate your work! [/quote]The deepest real-world hierarchies I'm used to seeing are certain multi-level-marketing companies.  Those can get pretty deep.Adjacency queries turn into nightmares on that kind of data.  After all, they have to do a scan for every level, and deep hierarchies are (by their very nature) large tables.Nested Sets are still nearly instantaneous to query.  If it's a lot of rows of data, returning the dataset to the client and rendering it will be your performance issue on those.  No matter how deep a Nested Set hierarchy is, querying one is simply a single scan.  If you haven't indexed the necessary columns, that might be a problem, but that's not likely to come up since the hierarchy data is likely to be the clustered index on a hierarchy table.Hierarchy ID works like nested sets.  On deep queries, it's still very fast.  If you look into the way they get stored and indexed, which is as a binary string, that makes sense.  After all, 0x1 and all it's children nodes, 0x102, 0x1F3, and so on, are all in the same range.  It's like looking up all the names that start with "A" in the phone book.Updates are the same as in the examples in the article.  Adjacency, you just update one row and you're done.  Should be so fast, even on a million-rows deep, that you can't measure the time taken.  Nested Sets, you rebuild the table, and that takes lots of time.  Hierarchy ID, you have to update every child, so you're doing a million-row update, and that won't be fast and might take a fair chunk of disk space in both tempdb and your log file.  It'll be massively faster than a Nested Sets update, and a lot easier to do, but it will be slow compared to Adjacency.</description><pubDate>Tue, 21 Aug 2012 06:47:44 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Shawn,I read up on this MLM stuff.Very wicked indeed (gets even compared to Ponzi schemes).I find it difficult to see how a sales person at node level 1042 appears to a sales person at node level 1 especially when it comes to the concept of node level N getting a commission for sales done at node level N+1.I can see the essence of a bill of materials that makes it necessary to be organized hierarchically. But even in a space shuttle I can't see past 50 levels.Interesting stuff though...</description><pubDate>Tue, 21 Aug 2012 01:00:43 GMT</pubDate><dc:creator>Michael Meierruth</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Great article Gus!I haven't had a chance to play around with HiearchyIDs myself yet but I must put it on my agenda to try it out.  And your examples are a great place to start an analysis.I'm also interested to see if there are any variations to the hiearchy traversal methods of adjacency lists that might yield better performance, although I doubt I'm smart enough to figure them out. :-)I did have one recent case that I was able to improve though.  So it may be possible.</description><pubDate>Mon, 20 Aug 2012 23:54:19 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Hey Michael,I get that question a lot! :)It definitely is not a common need but one of the things my company supports are compensation plans for direct marketing and multilevel marketing companies. The specific plan I am referring to is called a Binary plan ( [url=http://en.wikipedia.org/wiki/Binary_plan]http://en.wikipedia.org/wiki/Binary_plan[/url] ).In these plans, it is common for one leg to grow much faster than the others and I did have one with 1042 nodes in a single leg. rest of tree was approx 400,000 nodesI imagine a Bill Of Materials for a space shuttle might get large too  ;) (though probably not 1000 parts deep)I am hoping to have a single solution that works nicely over our common needs (few thousand nodes mostly balanced) as well as these "exception" cases to allow our codebase to be common across clients. This isn't a hard-fast need, would just be nice to not have to switch out a client as they grow past possible thresholds.</description><pubDate>Mon, 20 Aug 2012 23:47:35 GMT</pubDate><dc:creator>Shawn Dube</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Shawn,In the real business world, I have encountered hierarchies around 15 nodes deep. They're fun. Where in the real world (business or not) have you encountered a hierarchy "a few thousand nodes deep" that needs to be dealt with using SQL? I'm very curious.</description><pubDate>Mon, 20 Aug 2012 23:26:57 GMT</pubDate><dc:creator>Michael Meierruth</dc:creator></item><item><title>RE: Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Thanks for this article and the original! One question though... have you tried a very deep hierarchy on any of your methods? Like a million node binary tree (not necessarily balanced)? Something where one "leg" of the tree is a few thousand nodes deep?Just wondering what limitations may creep up. Greatly appreciate your work! </description><pubDate>Mon, 20 Aug 2012 22:21:03 GMT</pubDate><dc:creator>Shawn Dube</dc:creator></item><item><title>Hierarchies in SQL, Part II, the Sequel</title><link>http://www.sqlservercentral.com/Forums/Topic1347552-1312-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/articles/T-SQL/92461/"&gt;Hierarchies in SQL, Part II, the Sequel&lt;/A&gt;[/B]</description><pubDate>Mon, 20 Aug 2012 21:55:34 GMT</pubDate><dc:creator>GSquared</dc:creator></item></channel></rss>