HierarchyID performance problems... Really???

Question

HierarchyID performance problems... Really???

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 1

Sean Lange (10/4/2011)
I could of course do this pretty simple with an adjaceny list but...

Oddly enough, I've found "hybrid" tables that contain an Adjacency List, Hierarchical Path (without the HierarchyID, so far), and Nested Sets to have advantages that no one particular method has. For example and as you've said, it's very easy to move, add, and delete nodes and whole sub-trees in the Adjacency List (a human can easily eye-ball the correct changes, if necessary) but there's some blinding speed and query flexibility to be had with Nested Sets.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 2

CGSJohnson (10/4/2011)
Hi, Jeff. Sorry, but I cannot talk about it all. However, one use would be for the common goal of organizational structure.
Thanks...Chris

Understood. Thanks for the thought, anyway.

Yep... I agree... organizational structures are a primary use of hierarchies.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 3

I guess I can wrap this thread up by saying I've done a deeper dive on some of the internet posts that claim performance problems with the HierarchyID datatype and its related methods. So far, all of them have turned out to be "false alarms" where certain (bad) programming practices, such as the use of non-SARGable search predicates, were used that would slow down any query and not just those related to the HierarchyID.

I haven't given up the search for actual performance problems caused by the use of the HierarchyID datatype or related methods, but I'm going to turn my efforts more to comparing performance between the HierarchyID methods and Nested Set methods.

Thanks to the good folks who posted on this thread and, believe it or not, thanks to the folks that didn't... it kind of shows that not a whole lot of people have to work with Hierarchies and the ones that are, simply aren't having performance problems with their hierarchies, are not working with large hierarchies, or are simply not aware that they may have a performance problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Johan Bijnens SSC Guru Points: 135516 More actions Follow · Answer 4

CGSJohnson (10/4/2011)
Hi, Jeff. Sorry, but I cannot talk about it all. However, one use would be for the common goal of organizational structure.
Thanks...Chris

Don't talk to Jeff about it, if you'd have to kill him afterward ! I need him next Tuesday evening :hehe:

Back to the topic though, I haven't seen HierarchyID being used.

I only played a bit with it to just get a little grip on it.

BTW you don't need million row objects to get into performance problems, so strive for most optimal setup as much as you can !

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 5

ALZDBA (10/5/2011)
BTW you don't need million row objects to get into performance problems, so strive for most optimal setup as much as you can !

Heh... true enough. Some of the posts with "performance problems" talked about hierarchy tables with only a thousand or so rows. That's why I test with a million... it makes the "thousands" easy.

Thanks for the feedback, Johan. I know a lot of folks that say they're going to impliment it "soon" for one reason or another, but I don't know of anyone who actually has. I know a couple of folks that have implimented hierarchies in their work... they just haven't done it using the HierarchyID datatype.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Johan Bijnens SSC Guru Points: 135516 More actions Follow · Answer 6

Jeff Moden (10/5/2011)
ALZDBA (10/5/2011)
BTW you don't need million row objects to get into performance problems, so strive for most optimal setup as much as you can !
Heh... true enough. Some of the posts with "performance problems" talked about hierarchy tables with only a thousand or so rows. That's why I test with a million... it makes the "thousands" easy.

And that is one of the reasons we get to meet again next Tuesday[/url] :w00t:

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 7

Congratulations Jeff for ‘Exceptional DBA of 2011’ award.

I recently joined SSC & exploring SQL Server by reading your (& other GURU’s) comments.

Thanks for your valuable contributions. Keep sharing your knowledge.

CGSJohnson SSCertifiable Points: 7208 More actions · Answer 8

Don't worry, everyone...I did not have to kill Jeff.

dw.bi SSC Enthusiast Points: 122 More actions · Answer 9

Hi there,

Recently my boss wanted me to experiment with this HierarchyID to see if it can improve performance of the reports that we run off of a tree with about 700,000 nodes and 20-25 levels.

After having implemented a loop structure, a CTE and a hierarchyID, (with the above numbers) I have come to the conclusion that hierarchyID is by far the fastest method to get a sub tree, followed by closely by the loop and then distantly by a panting CTE, which I think is crap after you have crossed a few thousand rows.

However, if you need a routine which has more brains than just retrieving a mere sub-tree, the loop structure with some temp variables and tables is the best. The hierarchyID is incredibly inflexible when it comes extracting portions of a sub-tree which fulfil your criteria.

For example, if you want to get the sub tree which should have:

- Some nodes PLUS

- First occurences in the down line which have Flag B set to 1 , flag c set to 1 and sales >5000 BUT avoid the descendants of THESE nodes

- Avoid any further down line nodes which have Flag B set to 1 , flag c set to 1 and sales >5000

-And some other numerical calculations on the nodes

In the above example, the hierarchyID would first run to fetch everything in the sub tree and then you have to prune the leaves you do not want.

A loop structure with temp variables on the other hand, would on each iteration of the loop fetch ONLY those things that you want in the first place and hence saves you the painful time that deletions or joins or excepts would cost you.

So the hierarchyID is not as versatile as a loop.

Please let me know what you think if what I wrote makes any sense.

Ben Teraberry SSChampion Points: 11845 More actions · Answer 10

Itzik Ben-Gan has a good code example of a T-SQL basic (and limited) equivalent to what what the hierachyID datatype does behind the scenes (and without the coding overhead on our part) but unfortunately I'm not sure if it's distributable because I can't find it publicly available. Basically, all the datatype is doing is ordering the key (via CLR) in such a way that an index can be scanned quickly a single time to get either the whole tree in a logical order or a portion of the tree. Previously, a lot of hierarchy code has been done via implicit (e.g. recursive cte's) or explicit cursors, which results in an exponentially large number of reads as the number of records increases.

(Obviously most of you already know all this and more but I thought it would be nice to provide a brief explanation of the benefits so newer developers can understand why someone would use the hierarchyID.)

We have an old product for a canned response system that's been sitting in development that was coded with a recursive cte's and I've been working on redoing it with hierarchyID's. I don't believe it solves every problem, but I do think it's clearly faster than the typical alternatives. In the past there was really no good way to do such things on larger systems without quite a bit of work.

└> bt

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 11

dw.bi (11/30/2011)
Hi there,
Recently my boss wanted me to experiment with this HierarchyID to see if it can improve performance of the reports that we run off of a tree with about 700,000 nodes and 20-25 levels.
After having implemented a loop structure, a CTE and a hierarchyID, (with the above numbers) I have come to the conclusion that hierarchyID is by far the fastest method to get a sub tree, followed by closely by the loop and then distantly by a panting CTE, which I think is crap after you have crossed a few thousand rows.
However, if you need a routine which has more brains than just retrieving a mere sub-tree, the loop structure with some temp variables and tables is the best. The hierarchyID is incredibly inflexible when it comes extracting portions of a sub-tree which fulfil your criteria.
For example, if you want to get the sub tree which should have:
- Some nodes PLUS
- First occurences in the down line which have Flag B set to 1 , flag c set to 1 and sales >5000 BUT avoid the descendants of THESE nodes
- Avoid any further down line nodes which have Flag B set to 1 , flag c set to 1 and sales >5000
-And some other numerical calculations on the nodes
In the above example, the hierarchyID would first run to fetch everything in the sub tree and then you have to prune the leaves you do not want.
A loop structure with temp variables on the other hand, would on each iteration of the loop fetch ONLY those things that you want in the first place and hence saves you the painful time that deletions or joins or excepts would cost you.
So the hierarchyID is not as versatile as a loop.
Please let me know what you think if what I wrote makes any sense.

By "First occurences in the down line which have Flag B set to 1 , flag c set to 1 and sales >5000 BUT avoid the descendants of THESE nodes", you mean the first occurance of any downline path from a given node and that you can have more than one return so long as they aren't in the same branch, correct?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dw.bi SSC Enthusiast Points: 122 More actions · Answer 12

My stored proc takes in a node number as a parameter and returns the downline depending on the filters.

For the tree in image(attached), assuming that the RED BLOCKS have some special flags set, if I enter "a" as the parameter,

I get back the following as the output:

LevelNode

0a

1b

1c

2b1

2b2

2b3

2c1

2c2

2c3

3b21

3b22

3c21

3c22

hugo-939487 SSC Enthusiast Points: 143 More actions · Answer 13

Hi Jeff,

Ik don't understand the BuildTestHierarchy stored procedure.

In my database I get the Employee table, but I don't see any column of type hierarchyid.

Do I have to do some additional stuff? Or I don't understand it... I never used hierarchyid.

Regards, Hugo.

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 14

hugo-939487 (12/2/2011)
Hi Jeff,
Ik don't understand the BuildTestHierarchy stored procedure.
In my database I get the Employee table, but I don't see any column of type hierarchyid.
Do I have to do some additional stuff? Or I don't understand it... I never used hierarchyid.
Regards, Hugo.

Apologies for the delay, Hugo.

You are correct. The test table I provided the code for doesn't have the "build" for a HierarchyID column. I left that out because I didn't want to "taint" anyones testing. I typically build such a thing right into the same table as the "Adjacency List". Others would rather have it in a separate table. I was leaving it up to anyone who wanted to do the testing to build and index the HierarchyID column from the "Adjacency List" in the manner they thought was best.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 15

dw.bi (12/1/2011)
My stored proc takes in a node number as a parameter and returns the downline depending on the filters.
For the tree in image(attached), assuming that the RED BLOCKS have some special flags set, if I enter "a" as the parameter,
I get back the following as the output:
LevelNode
0a
1b
1c
2b1
2b2
2b3
2c1
2c2
2c3
3b21
3b22
3c21
3c22

If the hierarchical structure is fairly static, add another flag column to the structure. As a part of nightly processing, find all "Red Box" items as a set and set the new flag column for the decendents of those boxes. Then, when you do queries, add the condition of where that flag isn't set and things should be much faster than the WHILE loop. Including that flag in a good index will certainly help.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

HierarchyID performance problems... Really???

Cookies on SQLServerCentral