Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Hierarchies in SQL, Part II, the Sequel Expand / Collapse
Author
Message
Posted Monday, August 20, 2012 9:55 PM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Comments posted to this topic are about the item Hierarchies in SQL, Part II, the Sequel

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1347552
Posted Monday, August 20, 2012 10:21 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Yesterday @ 7:06 AM
Points: 18, Visits: 216
Thanks for this article and the original! One question though... have you tried a very deep hierarchy on any of your methods? Like a million node binary tree (not necessarily balanced)? Something where one "leg" of the tree is a few thousand nodes deep?

Just wondering what limitations may creep up.

Greatly appreciate your work!

Post #1347558
Posted Monday, August 20, 2012 11:26 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 2:12 PM
Points: 542, Visits: 2,118
Shawn,
In the real business world, I have encountered hierarchies around 15 nodes deep. They're fun. Where in the real world (business or not) have you encountered a hierarchy "a few thousand nodes deep" that needs to be dealt with using SQL? I'm very curious.
Post #1347570
Posted Monday, August 20, 2012 11:47 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Yesterday @ 7:06 AM
Points: 18, Visits: 216
Hey Michael,

I get that question a lot! :)

It definitely is not a common need but one of the things my company supports are compensation plans for direct marketing and multilevel marketing companies. The specific plan I am referring to is called a Binary plan ( http://en.wikipedia.org/wiki/Binary_plan ).

In these plans, it is common for one leg to grow much faster than the others and I did have one with 1042 nodes in a single leg. rest of tree was approx 400,000 nodes

I imagine a Bill Of Materials for a space shuttle might get large too ;) (though probably not 1000 parts deep)

I am hoping to have a single solution that works nicely over our common needs (few thousand nodes mostly balanced) as well as these "exception" cases to allow our codebase to be common across clients. This isn't a hard-fast need, would just be nice to not have to switch out a client as they grow past possible thresholds.

Post #1347574
Posted Monday, August 20, 2012 11:54 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 7:42 PM
Points: 3,648, Visits: 5,327
Great article Gus!

I haven't had a chance to play around with HiearchyIDs myself yet but I must put it on my agenda to try it out. And your examples are a great place to start an analysis.

I'm also interested to see if there are any variations to the hiearchy traversal methods of adjacency lists that might yield better performance, although I doubt I'm smart enough to figure them out.

I did have one recent case that I was able to improve though. So it may be possible.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1347575
Posted Tuesday, August 21, 2012 1:00 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 2:12 PM
Points: 542, Visits: 2,118
Shawn,

I read up on this MLM stuff.
Very wicked indeed (gets even compared to Ponzi schemes).
I find it difficult to see how a sales person at node level 1042 appears to a sales person at node level 1 especially when it comes to the concept of node level N getting a commission for sales done at node level N+1.

I can see the essence of a bill of materials that makes it necessary to be organized hierarchically. But even in a space shuttle I can't see past 50 levels.

Interesting stuff though...
Post #1347592
Posted Tuesday, August 21, 2012 6:47 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Shawn Dube (8/20/2012)
Thanks for this article and the original! One question though... have you tried a very deep hierarchy on any of your methods? Like a million node binary tree (not necessarily balanced)? Something where one "leg" of the tree is a few thousand nodes deep?

Just wondering what limitations may creep up.

Greatly appreciate your work!



The deepest real-world hierarchies I'm used to seeing are certain multi-level-marketing companies. Those can get pretty deep.

Adjacency queries turn into nightmares on that kind of data. After all, they have to do a scan for every level, and deep hierarchies are (by their very nature) large tables.

Nested Sets are still nearly instantaneous to query. If it's a lot of rows of data, returning the dataset to the client and rendering it will be your performance issue on those. No matter how deep a Nested Set hierarchy is, querying one is simply a single scan. If you haven't indexed the necessary columns, that might be a problem, but that's not likely to come up since the hierarchy data is likely to be the clustered index on a hierarchy table.

Hierarchy ID works like nested sets. On deep queries, it's still very fast. If you look into the way they get stored and indexed, which is as a binary string, that makes sense. After all, 0x1 and all it's children nodes, 0x102, 0x1F3, and so on, are all in the same range. It's like looking up all the names that start with "A" in the phone book.

Updates are the same as in the examples in the article. Adjacency, you just update one row and you're done. Should be so fast, even on a million-rows deep, that you can't measure the time taken. Nested Sets, you rebuild the table, and that takes lots of time. Hierarchy ID, you have to update every child, so you're doing a million-row update, and that won't be fast and might take a fair chunk of disk space in both tempdb and your log file. It'll be massively faster than a Nested Sets update, and a lot easier to do, but it will be slow compared to Adjacency.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1347753
Posted Tuesday, August 21, 2012 6:55 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Shawn Dube (8/20/2012)
Hey Michael,

I get that question a lot! :)

It definitely is not a common need but one of the things my company supports are compensation plans for direct marketing and multilevel marketing companies. The specific plan I am referring to is called a Binary plan ( http://en.wikipedia.org/wiki/Binary_plan ).

In these plans, it is common for one leg to grow much faster than the others and I did have one with 1042 nodes in a single leg. rest of tree was approx 400,000 nodes

I imagine a Bill Of Materials for a space shuttle might get large too ;) (though probably not 1000 parts deep)

I am hoping to have a single solution that works nicely over our common needs (few thousand nodes mostly balanced) as well as these "exception" cases to allow our codebase to be common across clients. This isn't a hard-fast need, would just be nice to not have to switch out a client as they grow past possible thresholds.


In the MLM schemes I'm accustomed to reading about (I've not worked for one myself, so can't speak from direct experience), additions and queries are much more common than updates.

In other words, Bob may recruit a bunch more salespeople under him, but the ones already under him aren't likely to suddenly need to be moved under Sam instead. If Joe works under Bob, you're not likely to need to update Joe's position in the hierarchy, and that of everyone beneath him, or at least not frequently.

Does that match your usual business need?

If so, then Hierarchy ID is probably your best bet. Easier to add to than Nested Sets, faster to query than Adjacency, by large margins in both cases.

If you are using Adjacency, then recursion limits might do interesting things to you data/performance as well, but querying Hierarchy ID data isn't recursive. So 1,000+ levels deep is still just a single index range-scan, instead of a scan for each level.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1347758
Posted Tuesday, August 21, 2012 11:10 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, September 2, 2014 7:40 AM
Points: 90, Visits: 310
For hierarchies of < 50 levels depth, Nested Intervals (Hazel) may be of interest.


Post #1347962
Posted Tuesday, August 21, 2012 12:08 PM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
HLogic (8/21/2012)
For hierarchies of < 50 levels depth, Nested Intervals (Hazel) may be of interest.


I consider those just a flavor of Nested Sets. It has the one advantage that it doesn't require expanding the range at the top level when you insert at lower levels, while trading that off for complexity of implementation, difficulty in inserting nodes between other nodes at lower levels (requires recomputation of all sub-nodes that are shifted to the right; both version have this issue), difficulty for human data-verification (requires fairly complex math to visually validate the tree composition), and problems with rounding errors towards the bottom of the hierarchy if depth is greater than a certain level (depending on the data type used, this can happen shallower or deeper in the hierarchy).

It retains the query speed of Nested Sets, since it's a single index scan (usually a range scan rather than a full scan) to resolve the most common hierarchy queries. Per my tests (admittedly limited), the HierarchyID datatype has query speed that's in the same range, can handle deeper hierarchies (it can only go 892 bytes deep, but that's a lot of hex values), and is faster and easier to edit.

For hierarchies too deep for HierarchyID, I'd probably go breadcrumb/path, if query speed matters more than storage size, or Nested Sets if the hierarchy is relatively static.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1348008
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse