Log in
::
Register
::
Not logged in
Home
Tags
Articles
Editorials
Stairways
Forums
Scripts
Videos
Blogs
QotD
Books
Ask SSC
SQL Jobs
Training
Authors
About us
Contact us
Newsletters
Write for us
Recent Posts
Recent Posts
Popular Topics
Popular Topics
Home
Search
Members
Calendar
Who's On
Home
»
Article Discussions
»
Article Discussions by Author
»
Discuss content posted by GSquared
»
Hierarchies in SQL, Part II, the Sequel
19 posts, Page 1 of 2
1
2
»»
Hierarchies in SQL, Part II, the Sequel
Rate Topic
Display Mode
Topic Options
Author
Message
GSquared
GSquared
Posted Monday, August 20, 2012 9:55 PM
SSCoach
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:55 PM
Points: 15,442,
Visits: 9,571
Comments posted to this topic are about the item
Hierarchies in SQL, Part II, the Sequel
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1347552
Shawn Dube
Shawn Dube
Posted Monday, August 20, 2012 10:21 PM
Grasshopper
Group: General Forum Members
Last Login: Wednesday, May 22, 2013 4:40 PM
Points: 16,
Visits: 182
Thanks for this article and the original! One question though... have you tried a very deep hierarchy on any of your methods? Like a million node binary tree (not necessarily balanced)? Something where one "leg" of the tree is a few thousand nodes deep?
Just wondering what limitations may creep up.
Greatly appreciate your work!
Post #1347558
Michael Meierruth
Michael Meierruth
Posted Monday, August 20, 2012 11:26 PM
SSC-Addicted
Group: General Forum Members
Last Login: Wednesday, May 22, 2013 2:09 PM
Points: 480,
Visits: 1,605
Shawn,
In the real business world, I have encountered hierarchies around 15 nodes deep. They're fun. Where in the real world (business or not) have you encountered a hierarchy "a few thousand nodes deep" that needs to be dealt with using SQL? I'm very curious.
Post #1347570
Shawn Dube
Shawn Dube
Posted Monday, August 20, 2012 11:47 PM
Grasshopper
Group: General Forum Members
Last Login: Wednesday, May 22, 2013 4:40 PM
Points: 16,
Visits: 182
Hey Michael,
I get that question a lot! :)
It definitely is not a common need but one of the things my company supports are compensation plans for direct marketing and multilevel marketing companies. The specific plan I am referring to is called a Binary plan (
http://en.wikipedia.org/wiki/Binary_plan
).
In these plans, it is common for one leg to grow much faster than the others and I did have one with 1042 nodes in a single leg. rest of tree was approx 400,000 nodes
I imagine a Bill Of Materials for a space shuttle might get large too ;) (though probably not 1000 parts deep)
I am hoping to have a single solution that works nicely over our common needs (few thousand nodes mostly balanced) as well as these "exception" cases to allow our codebase to be common across clients. This isn't a hard-fast need, would just be nice to not have to switch out a client as they grow past possible thresholds.
Post #1347574
dwain.c
dwain.c
Posted Monday, August 20, 2012 11:54 PM
SSCrazy
Group: General Forum Members
Last Login: Yesterday @ 7:24 PM
Points: 2,346,
Visits: 3,192
Great article Gus!
I haven't had a chance to play around with HiearchyIDs myself yet but I must put it on my agenda to try it out. And your examples are a great place to start an analysis.
I'm also interested to see if there are any variations to the hiearchy traversal methods of adjacency lists that might yield better performance, although I doubt I'm smart enough to figure them out.
I did have one recent case that I was able to improve though. So it may be possible.
No loops! No CURSORs! No RBAR! Hoo-uh!
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Are you too recursively challenged?
Splitting strings based on patterns can be fast!
Post #1347575
Michael Meierruth
Michael Meierruth
Posted Tuesday, August 21, 2012 1:00 AM
SSC-Addicted
Group: General Forum Members
Last Login: Wednesday, May 22, 2013 2:09 PM
Points: 480,
Visits: 1,605
Shawn,
I read up on this MLM stuff.
Very wicked indeed (gets even compared to Ponzi schemes).
I find it difficult to see how a sales person at node level 1042 appears to a sales person at node level 1 especially when it comes to the concept of node level N getting a commission for sales done at node level N+1.
I can see the essence of a bill of materials that makes it necessary to be organized hierarchically. But even in a space shuttle I can't see past 50 levels.
Interesting stuff though...
Post #1347592
GSquared
GSquared
Posted Tuesday, August 21, 2012 6:47 AM
SSCoach
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:55 PM
Points: 15,442,
Visits: 9,571
Shawn Dube (8/20/2012)
Thanks for this article and the original! One question though... have you tried a very deep hierarchy on any of your methods? Like a million node binary tree (not necessarily balanced)? Something where one "leg" of the tree is a few thousand nodes deep?
Just wondering what limitations may creep up.
Greatly appreciate your work!
The deepest real-world hierarchies I'm used to seeing are certain multi-level-marketing companies. Those can get pretty deep.
Adjacency queries turn into nightmares on that kind of data. After all, they have to do a scan for every level, and deep hierarchies are (by their very nature) large tables.
Nested Sets are still nearly instantaneous to query. If it's a lot of rows of data, returning the dataset to the client and rendering it will be your performance issue on those. No matter how deep a Nested Set hierarchy is, querying one is simply a single scan. If you haven't indexed the necessary columns, that might be a problem, but that's not likely to come up since the hierarchy data is likely to be the clustered index on a hierarchy table.
Hierarchy ID works like nested sets. On deep queries, it's still very fast. If you look into the way they get stored and indexed, which is as a binary string, that makes sense. After all, 0x1 and all it's children nodes, 0x102, 0x1F3, and so on, are all in the same range. It's like looking up all the names that start with "A" in the phone book.
Updates are the same as in the examples in the article. Adjacency, you just update one row and you're done. Should be so fast, even on a million-rows deep, that you can't measure the time taken. Nested Sets, you rebuild the table, and that takes lots of time. Hierarchy ID, you have to update every child, so you're doing a million-row update, and that won't be fast and might take a fair chunk of disk space in both tempdb and your log file. It'll be massively faster than a Nested Sets update, and a lot easier to do, but it will be slow compared to Adjacency.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1347753
GSquared
GSquared
Posted Tuesday, August 21, 2012 6:55 AM
SSCoach
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:55 PM
Points: 15,442,
Visits: 9,571
Shawn Dube (8/20/2012)
Hey Michael,
I get that question a lot! :)
It definitely is not a common need but one of the things my company supports are compensation plans for direct marketing and multilevel marketing companies. The specific plan I am referring to is called a Binary plan (
http://en.wikipedia.org/wiki/Binary_plan
).
In these plans, it is common for one leg to grow much faster than the others and I did have one with 1042 nodes in a single leg. rest of tree was approx 400,000 nodes
I imagine a Bill Of Materials for a space shuttle might get large too ;) (though probably not 1000 parts deep)
I am hoping to have a single solution that works nicely over our common needs (few thousand nodes mostly balanced) as well as these "exception" cases to allow our codebase to be common across clients. This isn't a hard-fast need, would just be nice to not have to switch out a client as they grow past possible thresholds.
In the MLM schemes I'm accustomed to reading about (I've not worked for one myself, so can't speak from direct experience), additions and queries are much more common than updates.
In other words, Bob may recruit a bunch more salespeople under him, but the ones already under him aren't likely to suddenly need to be moved under Sam instead. If Joe works under Bob, you're not likely to need to update Joe's position in the hierarchy, and that of everyone beneath him, or at least not frequently.
Does that match your usual business need?
If so, then Hierarchy ID is probably your best bet. Easier to add to than Nested Sets, faster to query than Adjacency, by large margins in both cases.
If you are using Adjacency, then recursion limits might do interesting things to you data/performance as well, but querying Hierarchy ID data isn't recursive. So 1,000+ levels deep is still just a single index range-scan, instead of a scan for each level.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1347758
HLogic
HLogic
Posted Tuesday, August 21, 2012 11:10 AM
SSC Journeyman
Group: General Forum Members
Last Login: Wednesday, May 22, 2013 3:02 PM
Points: 90,
Visits: 272
For hierarchies of < 50 levels depth, Nested Intervals (Hazel) may be of interest.
Post #1347962
GSquared
GSquared
Posted Tuesday, August 21, 2012 12:08 PM
SSCoach
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:55 PM
Points: 15,442,
Visits: 9,571
HLogic (8/21/2012)
For hierarchies of < 50 levels depth, Nested Intervals (Hazel) may be of interest.
I consider those just a flavor of Nested Sets. It has the one advantage that it doesn't require expanding the range at the top level when you insert at lower levels, while trading that off for complexity of implementation, difficulty in inserting nodes between other nodes at lower levels (requires recomputation of all sub-nodes that are shifted to the right; both version have this issue), difficulty for human data-verification (requires fairly complex math to visually validate the tree composition), and problems with rounding errors towards the bottom of the hierarchy if depth is greater than a certain level (depending on the data type used, this can happen shallower or deeper in the hierarchy).
It retains the query speed of Nested Sets, since it's a single index scan (usually a range scan rather than a full scan) to resolve the most common hierarchy queries. Per my tests (admittedly limited), the HierarchyID datatype has query speed that's in the same range, can handle deeper hierarchies (it can only go 892 bytes deep, but that's a lot of hex values), and is faster and easier to edit.
For hierarchies too deep for HierarchyID, I'd probably go breadcrumb/path, if query speed matters more than storage size, or Nested Sets if the hierarchy is relatively static.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1348008
« Prev Topic
|
Next Topic »
19 posts, Page 1 of 2
1
2
»»
Permissions
You
cannot
post new topics.
You
cannot
post topic replies.
You
cannot
post new polls.
You
cannot
post replies to polls.
You
cannot
edit your own topics.
You
cannot
delete your own topics.
You
cannot
edit other topics.
You
cannot
delete other topics.
You
cannot
edit your own posts.
You
cannot
edit other posts.
You
cannot
delete your own posts.
You
cannot
delete other posts.
You
cannot
post events.
You
cannot
edit your own events.
You
cannot
edit other events.
You
cannot
delete your own events.
You
cannot
delete other events.
You
cannot
send private messages.
You
cannot
send emails.
You
may
read topics.
You
cannot
rate topics.
You
cannot
vote within polls.
You
cannot
upload attachments.
You
may
download attachments.
You
cannot
post HTML code.
You
cannot
edit HTML code.
You
cannot
post IFCode.
You
cannot
post JavaScript.
You
cannot
post EmotIcons.
You
cannot
post or upload images.
Copyright © 2002-2013 Simple Talk Publishing. All Rights Reserved.
Privacy Policy.
Terms of Use.
Report Abuse.