HIERARCHYID DataType and Referential Integrity

  • Hi all,

    I had assumed this would have been an easily answered question, but after quite a bit of searching, I was surprised to find no direct answers.

    The HIERARCHYID datatype clearly was designed to structure data in a hierarchical fashion, in order to support tree traversal and other objectives.

    However, as far as I can tell, there is nothing enforcing referential integrity in there tree! As a simple example:

    CREATE TABLE #Test
    (
     ID INT IDENTITY PRIMARY KEY,
     Hierarchy HIERARCHYID
    )

    INSERT INTO #Test (Hierarchy)
     SELECT HIERARCHYID::GetRoot()
    DECLARE @ChildHierarchyID HIERARCHYID
    DECLARE @ParentHierarchyID HIERARCHYID
    DECLARE @MaxChildID HIERARCHYID
      
    SELECT @ParentHierarchyID = Hierarchy
    FROM #Test
    WHERE ID = 1
    SELECT @MaxChildID = MAX(Hierarchy)
    FROM #Test
    WHERE Hierarchy.GetAncestor(1) = @ParentHierarchyID
       
    SET @ChildHierarchyID = @ParentHierarchyID.GetDescendant(@MaxChildID, NULL)
    INSERT INTO #Test (Hierarchy)
     SELECT @ChildHierarchyID
    DELETE FROM #Test WHERE ID = 1

    Nothing prevents the database from executing that DELETE statement, and after deletion, the child's parent no longer exists.

    Are there any best practices for enforcing referential integrity with the HIERARCHYID datatype / structure? I know I could add a separate FOREIGN KEY column, with the PRIMARY KEY being the PRIMARY KEY of the same table, but that seems way more complicated to manage than I'd like - essentially I'd be throwing away all of the benefits of using the HIERARCHYID datatype to start off with!

  • Hi guys,

    Not usually one to bump a thread, but this same topic came up again when I was doing some database design, and was hoping maybe someone had some insight; if not, I'll stop bumping it 🙂

  • kramaswamy - Saturday, September 16, 2017 10:52 AM

    Hi guys,

    Not usually one to bump a thread, but this same topic came up again when I was doing some database design, and was hoping maybe someone had some insight; if not, I'll stop bumping it 🙂

    Heh... instead, I'll bump it for you. 😉

    I have a couple of ideas/suggestions that I'll try to remember to post after work today.  Personally, I hate the HierarchyID and all that goes with it but someone that doesn't may have an answer in the meantime.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • +1

  • Apologies  I lost track of this one.  Here's both my opinion and the method I use...

    The HierarchyID is based on a "Hierarchical Path" type of hierarchical structure.  It has some good features but doesn't lend itself well to DRI and other things.  It's also a real bugger to fix if something goes haywire because each node is "aware" of it's entire upline.  Because the underlying values in the hierarchical path in the HierarchyID are difficult to use directly, you can't optimize upline and downline searches and it's a different kind of "Hierarchical Path" making aggregations a bit slow, as well.  It they had thought about it when they created it, it would be a very fast method (likely the fastest) for doing aggregations.

    A "Parent/Child" list, also known as an "Adjaceny List" is much easier for humans to understand and maintain.  IMHO, it's less likely to go haywire but, if it does, it's also fairly easy for a human to repair because each node is only aware of one and only one other node, its parent.  The problem with it is that it's tough to do substantial calculations with because it requires recursion to traverse and that inherently also makes it somewhat slow for upline and downline searches.  The key is that it's easy for humans to maintain and entire downlines can easily be moved by changing the parent of just 1 or several nodes.  An "Adjacency List" also allows for forms of DRI that others don't.  For example, you can setup a self-referencing FK that won't allow you to delete a parent if children of that parent still exist (just one of many checks available).

    The, there's the "Nested Sets" method, which is horrible to maintain directly because every node is "aware" of every other node based on a left and right anchor number, which I refer to as the "Left Bower" and "Right Bower" ("Bower" is the name of the large anchors on the front of some ships and they usually have one on the left and one on the right).  The methods for maintenance are fairly complicated and slow.  The old "push stack" method for converting a (for example) million node "Adjacency List" to "Nested Sets" takes something like two days to complete.  Knowing that, why would anyone ever use such a thing?  The answer is absolute blinding speed for lookups.  It still has a bit of a problem with level sensitive and other aggregations though.

    So, how do you design a hierarchy table?  Hierarchical paths are bitch to maintain but can have nasty fast performance for aggregations.  Adjacency Lists are relatively super easy for both computers and humans to maintain and is fairly easy to apply DRI to.  Nested Sets are nasty fast for lookups but prior methods take forever to create a nested set and it's difficult to maintain once formed.

    The answer is, like Yogi Berra is famous for saying, "When you come to a fork in the road, take it". 😉  In other words, use the best of all 3 methods.  In other words, create and maintain the hierarchy as a "position sensitive" Adjacency List and very quickly convert it to both a Hierarchical Path and a Nested Sets hierarchical structure.  The words "quickly" and "convert" used to be an oxymoron when it came to hierarchies in an RDBMS and so few ever took such a methodology on.  They'd usually settle for just one hierarchical structure or the other and put up with any short comings.

    That performance barrier no longer exists.  Please see the following "full Monty" article on how to create both a very useful Hierarchical Path and Nested Sets structure from an Adjacency list.  Thanks to some cool formulas that Adam Machanic provided and a special type of Tally Table, you can now create both structures from an easy to maintain Adjacency List on a million node hierarchy in about 54 seconds (think of how fast things will be for less than that... test results included in the article).  You can find that method in the following article by yours truly.
    Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets

    There's also a 4th type of hierarchy that really didn't use to exist.  Think of it as a pre-aggregated hierarchical data warehouse that answers most of the questions that you'd ask of a hierarchy.  It, too, is derived from an easy to maintain Adjacency List and resolves in the same amount of time... 54 seconds for a million node hierarchy and the lookups are also nasty fast.  You can find how to do that in the following article, also by yours truly.
    Hierarchies on Steroids #2: A Replacement for Nested Sets Calculations

    I hope that helps.  Its what I use when I have to deal with Acyclic Directed Hierarchies.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks Jeff - I'll take a look at your suggestions.

    But as to my original question, I guess your answer is, there isn't any real built-in way of enforcing referential integrity with the HIERARCHYID datatype, short of creating an UPDATE trigger or adding in a secondary FOREIGN KEY column and maintaining it?

  • kramaswamy - Wednesday, September 20, 2017 10:11 AM

    Thanks Jeff - I'll take a look at your suggestions.

    But as to my original question, I guess your answer is, there isn't any real built-in way of enforcing referential integrity with the HIERARCHYID datatype, short of creating an UPDATE trigger or adding in a secondary FOREIGN KEY column and maintaining it?

    Correct.  Not that I know of.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply