HierarchyID performance problems... Really???

  • Sean Lange (10/4/2011)


    I could of course do this pretty simple with an adjaceny list but...

    Oddly enough, I've found "hybrid" tables that contain an Adjacency List, Hierarchical Path (without the HierarchyID, so far), and Nested Sets to have advantages that no one particular method has. For example and as you've said, it's very easy to move, add, and delete nodes and whole sub-trees in the Adjacency List (a human can easily eye-ball the correct changes, if necessary) but there's some blinding speed and query flexibility to be had with Nested Sets.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CGSJohnson (10/4/2011)


    Hi, Jeff. Sorry, but I cannot talk about it all. However, one use would be for the common goal of organizational structure.

    Thanks...Chris

    Understood. Thanks for the thought, anyway.

    Yep... I agree... organizational structures are a primary use of hierarchies.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I guess I can wrap this thread up by saying I've done a deeper dive on some of the internet posts that claim performance problems with the HierarchyID datatype and its related methods. So far, all of them have turned out to be "false alarms" where certain (bad) programming practices, such as the use of non-SARGable search predicates, were used that would slow down any query and not just those related to the HierarchyID.

    I haven't given up the search for actual performance problems caused by the use of the HierarchyID datatype or related methods, but I'm going to turn my efforts more to comparing performance between the HierarchyID methods and Nested Set methods.

    Thanks to the good folks who posted on this thread and, believe it or not, thanks to the folks that didn't... it kind of shows that not a whole lot of people have to work with Hierarchies and the ones that are, simply aren't having performance problems with their hierarchies, are not working with large hierarchies, or are simply not aware that they may have a performance problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CGSJohnson (10/4/2011)


    Hi, Jeff. Sorry, but I cannot talk about it all. However, one use would be for the common goal of organizational structure.

    Thanks...Chris

    Don't talk to Jeff about it, if you'd have to kill him afterward ! I need him next Tuesday evening :hehe:

    Back to the topic though, I haven't seen HierarchyID being used.

    I only played a bit with it to just get a little grip on it.

    BTW you don't need million row objects to get into performance problems, so strive for most optimal setup as much as you can !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution ๐Ÿ˜€

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (10/5/2011)


    BTW you don't need million row objects to get into performance problems, so strive for most optimal setup as much as you can !

    Heh... true enough. Some of the posts with "performance problems" talked about hierarchy tables with only a thousand or so rows. That's why I test with a million... it makes the "thousands" easy. ๐Ÿ˜€

    Thanks for the feedback, Johan. I know a lot of folks that say they're going to impliment it "soon" for one reason or another, but I don't know of anyone who actually has. I know a couple of folks that have implimented hierarchies in their work... they just haven't done it using the HierarchyID datatype.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/5/2011)


    ALZDBA (10/5/2011)


    BTW you don't need million row objects to get into performance problems, so strive for most optimal setup as much as you can !

    Heh... true enough. Some of the posts with "performance problems" talked about hierarchy tables with only a thousand or so rows. That's why I test with a million... it makes the "thousands" easy. ๐Ÿ˜€

    And that is one of the reasons we get to meet again next Tuesday[/url] :w00t:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution ๐Ÿ˜€

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Congratulations Jeff for โ€˜Exceptional DBA of 2011โ€™ award.

    I recently joined SSC & exploring SQL Server by reading your (& other GURUโ€™s) comments.

    Thanks for your valuable contributions. Keep sharing your knowledge. ๐Ÿ™‚

  • Don't worry, everyone...I did not have to kill Jeff. ๐Ÿ™‚

  • Hi there,

    Recently my boss wanted me to experiment with this HierarchyID to see if it can improve performance of the reports that we run off of a tree with about 700,000 nodes and 20-25 levels.

    After having implemented a loop structure, a CTE and a hierarchyID, (with the above numbers) I have come to the conclusion that hierarchyID is by far the fastest method to get a sub tree, followed by closely by the loop and then distantly by a panting CTE, which I think is crap after you have crossed a few thousand rows.

    However, if you need a routine which has more brains than just retrieving a mere sub-tree, the loop structure with some temp variables and tables is the best. The hierarchyID is incredibly inflexible when it comes extracting portions of a sub-tree which fulfil your criteria.

    For example, if you want to get the sub tree which should have:

    - Some nodes PLUS

    - First occurences in the down line which have Flag B set to 1 , flag c set to 1 and sales >5000 BUT avoid the descendants of THESE nodes

    - Avoid any further down line nodes which have Flag B set to 1 , flag c set to 1 and sales >5000

    -And some other numerical calculations on the nodes

    In the above example, the hierarchyID would first run to fetch everything in the sub tree and then you have to prune the leaves you do not want.

    A loop structure with temp variables on the other hand, would on each iteration of the loop fetch ONLY those things that you want in the first place and hence saves you the painful time that deletions or joins or excepts would cost you.

    So the hierarchyID is not as versatile as a loop.

    Please let me know what you think if what I wrote makes any sense.

  • Itzik Ben-Gan has a good code example of a T-SQL basic (and limited) equivalent to what what the hierachyID datatype does behind the scenes (and without the coding overhead on our part) but unfortunately I'm not sure if it's distributable because I can't find it publicly available. Basically, all the datatype is doing is ordering the key (via CLR) in such a way that an index can be scanned quickly a single time to get either the whole tree in a logical order or a portion of the tree. Previously, a lot of hierarchy code has been done via implicit (e.g. recursive cte's) or explicit cursors, which results in an exponentially large number of reads as the number of records increases.

    (Obviously most of you already know all this and more but I thought it would be nice to provide a brief explanation of the benefits so newer developers can understand why someone would use the hierarchyID.)

    We have an old product for a canned response system that's been sitting in development that was coded with a recursive cte's and I've been working on redoing it with hierarchyID's. I don't believe it solves every problem, but I do think it's clearly faster than the typical alternatives. In the past there was really no good way to do such things on larger systems without quite a bit of work.

    โ””> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • dw.bi (11/30/2011)


    Hi there,

    Recently my boss wanted me to experiment with this HierarchyID to see if it can improve performance of the reports that we run off of a tree with about 700,000 nodes and 20-25 levels.

    After having implemented a loop structure, a CTE and a hierarchyID, (with the above numbers) I have come to the conclusion that hierarchyID is by far the fastest method to get a sub tree, followed by closely by the loop and then distantly by a panting CTE, which I think is crap after you have crossed a few thousand rows.

    However, if you need a routine which has more brains than just retrieving a mere sub-tree, the loop structure with some temp variables and tables is the best. The hierarchyID is incredibly inflexible when it comes extracting portions of a sub-tree which fulfil your criteria.

    For example, if you want to get the sub tree which should have:

    - Some nodes PLUS

    - First occurences in the down line which have Flag B set to 1 , flag c set to 1 and sales >5000 BUT avoid the descendants of THESE nodes

    - Avoid any further down line nodes which have Flag B set to 1 , flag c set to 1 and sales >5000

    -And some other numerical calculations on the nodes

    In the above example, the hierarchyID would first run to fetch everything in the sub tree and then you have to prune the leaves you do not want.

    A loop structure with temp variables on the other hand, would on each iteration of the loop fetch ONLY those things that you want in the first place and hence saves you the painful time that deletions or joins or excepts would cost you.

    So the hierarchyID is not as versatile as a loop.

    Please let me know what you think if what I wrote makes any sense.

    By "First occurences in the down line which have Flag B set to 1 , flag c set to 1 and sales >5000 BUT avoid the descendants of THESE nodes", you mean the first occurance of any downline path from a given node and that you can have more than one return so long as they aren't in the same branch, correct?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My stored proc takes in a node number as a parameter and returns the downline depending on the filters.

    For the tree in image(attached), assuming that the RED BLOCKS have some special flags set, if I enter "a" as the parameter,

    I get back the following as the output:

    LevelNode

    0a

    1b

    1c

    2b1

    2b2

    2b3

    2c1

    2c2

    2c3

    3b21

    3b22

    3c21

    3c22

  • Hi Jeff,

    Ik don't understand the BuildTestHierarchy stored procedure.

    In my database I get the Employee table, but I don't see any column of type hierarchyid.

    Do I have to do some additional stuff? Or I don't understand it... I never used hierarchyid.

    Regards, Hugo.

  • hugo-939487 (12/2/2011)


    Hi Jeff,

    Ik don't understand the BuildTestHierarchy stored procedure.

    In my database I get the Employee table, but I don't see any column of type hierarchyid.

    Do I have to do some additional stuff? Or I don't understand it... I never used hierarchyid.

    Regards, Hugo.

    Apologies for the delay, Hugo.

    You are correct. The test table I provided the code for doesn't have the "build" for a HierarchyID column. I left that out because I didn't want to "taint" anyones testing. I typically build such a thing right into the same table as the "Adjacency List". Others would rather have it in a separate table. I was leaving it up to anyone who wanted to do the testing to build and index the HierarchyID column from the "Adjacency List" in the manner they thought was best.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dw.bi (12/1/2011)


    My stored proc takes in a node number as a parameter and returns the downline depending on the filters.

    For the tree in image(attached), assuming that the RED BLOCKS have some special flags set, if I enter "a" as the parameter,

    I get back the following as the output:

    LevelNode

    0a

    1b

    1c

    2b1

    2b2

    2b3

    2c1

    2c2

    2c3

    3b21

    3b22

    3c21

    3c22

    If the hierarchical structure is fairly static, add another flag column to the structure. As a part of nightly processing, find all "Red Box" items as a set and set the new flag column for the decendents of those boxes. Then, when you do queries, add the condition of where that flag isn't set and things should be much faster than the WHILE loop. Including that flag in a good index will certainly help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply