Hierarchies in SQL

  • Comments posted to this topic are about the item Hierarchies in SQL

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • For the sake of completeness, I'd mention another method to store hierarchies in the database - the path model which is similiar to the adjancey model, in that each node in the hierarchy contains information on its ancestor. However, in the path model, each node contains information on all its ancestors up to the root.

    HierarchyID in SQL Server 2008 does exactly that, and has a very rich set of functions to query and manipulate the data.

    Books Online article on HierarchyID here: http://technet.microsoft.com/en-us/library/bb677290.aspx

  • Nicely done, Gus. And, considering that the article is about a fix you made for an actual problem having to do with Adjacency and Nested Set methods, I don't have a problem with leaving out the Path method.

    I almost wish I had some hierarchy problems so I could use your good method and, heh..., justify the need for 2k8.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've played with the SQL 2008 hierarchyID data type a little bit, but only a few minutes, and I've never used it in a production environment.

    I have tested pre-2008 path-hierarchies, and found them to have all the negatives of denormalized data, and few of the positives. They are slower to update or query than adjacency, much slower to query than nested sets, and only slightly faster to update than nested sets. But that's basically because it's all string or XML functions. I'm not sure how performant the 2008 version is.

    I'll see about getting a dev copy of 2008 this weekend (after payday), and see what I can do by way of speed tests.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Or you could use CLR store procedures which are very fast in execution and sometimes are faster than T-SQL.

    http://blog.vmladenov.com/?p=197

  • Having not known about your topic, I found other resources:

    I needed a way to show a hierarchical structure and found a few informative sites:

    * http://groups.google.com/group/borland.public.delphi.objectpascal/browse_thread/thread/8bd20a203e20eaa2

    * http://www.delphi3000.com/articles/article_2740.asp?SK

    * http://www.evolt.org/article/Four_ways_to_work_with_hierarchical_data/17/4047/index.html

    And what most were saying, was the, Modified Preorder Tree Traversal Algorithm, was a preferred way of doing hierarchical sets.

    This is the Create for the tables (I am using both MySQL version 5 and Elevate DB version 2)

    --

    -- Definition of table `tblcategories`

    --

    DROP TABLE IF EXISTS `tblcategories`;

    CREATE TABLE `tblcategories` (

    `idtblcategories` int(10) unsigned NOT NULL auto_increment,

    `Description` varchar(45) NOT NULL,

    `lft` int(10) unsigned NOT NULL,

    `rgt` int(10) unsigned NOT NULL,

    PRIMARY KEY USING BTREE (`idtblcategories`),

    UNIQUE KEY `ndxDescription` (`Description`)

    ) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=latin1;

    -- some working records

    /*!40000 ALTER TABLE `tblcategories` DISABLE KEYS */;

    INSERT INTO `tblcategories` (`idtblcategories`,`Description`,`lft`,`rgt`) VALUES

    (11,'Food',1,18),

    (12,'Fruit',2,11),

    (13,'Red',3,6),

    (14,'Cherry',4,5),

    (15,'Yellow',7,10),

    (16,'Banana',8,9),

    (17,'Meat',12,17),

    (18,'Beef',13,14),

    (19,'Pork',15,16);

    /*!40000 ALTER TABLE `tblcategories` ENABLE KEYS */;

    To add a new node, say at Yellow, first begin a transaction, then update the existing entries:

    UPDATE tblcategories SET rgt = rgt + 2

    WHERE rgt > 7;

    UPDATE tblcategories SET lft = lft + 2

    WHERE lft > 10;

    Now, insert your row and commit:

    INSERT INTO tblcategories VALUES ('New Node', 8, 9);

    To remove a node, subtract instead of add 2 in the UPDDATE statement, then, delete your node.

    Happy hierarchy.

  • hi guys,

    I am new to the forum, so I would like to say "hi" first šŸ™‚

    Now back onto the topic: I would be very interested in finding out the results of your comparison tests, GSquared.

    I am not sure if you will agree with me, but there is also another factor when considering different solutions to a problem: simplicity. I am saying that as a tsql programmer - not a DBA.

    Working with the built in hierarchyID data type would be much cleaner and easier to understand than complicated structures involving complicated logic to maintain them and that also means less errors in my application caused by people using it in a wrong way.

    Thanks,

    Lukasz

  • I just feel, to tie myself to a DB structure type, like you mentioned using the hierarchcical ID, might limit my application's portability - say, if I want to take it from MS SQL server to MySQL or Oracle or DB2.

    Doing a small amount of coding in order to remain with ISO standard fields, is in my opinion, making for a far better application.

    My 2cents.

  • I liked the compromised you took when dealing with this problem.

    I have worked in the past with these issues and my opinion is that it is the "usage" what dictates what's better.

    - path-based hierarchies are *NOT* slow for many kind of queries but they *ARE* for some others.

    - Nested-Sets are *NOT* good as you already pointed out for changes BUT also certain queries are "hard" to craft.

    - The Built-in hierarchy-ID of 2008 is basically a path-based method that happens to be in binary representation and it has many (almost all) the drawbacks of the pre-2008 forms it just operates faster and it does not have as many limitations as a varchar column.

    All in all I think you put together an interesting method.

    Great work!


    * Noel

  • danielk1: yes, that's a "nested sets" hierarchy.

    For something like your sample, it's definitely the way to go. It's not likely that "Meat" will ever be moved out of the "Food" category, or that "Beef" will ever move out of the "Meat" sub-category.

    That works just fine for relatively static hierarchies. No problem.

    The problem I had to solve was what to do with a volatile hierarchy. In this case, a customer-facing web page that had hierarchical access control.

    Joe works for company X, and he places an order that will have details on this web page. Joe can see the details, as can various people at X. Simple hierarchy, right? Joe then changes to working with company Y, and now people at X should no longer see his details, but people at Y should. The following week, he moves to Z.

    At the same time, Bob moves from Z to X, and Sue moves from A to Y, and B adds twelve new people who will all be placing orders and who weren't even in your database.

    And that's a typical hour for this hierarchy.

    It's no longer as simple as adding "yellow" by expanding the range for "food". Even that isn't as simple as you might think. What happens in your example if you already have a category "dinnerware" that starts at 19? How do you then add "Vegetable" to "Food"? You can't add 2 to "Food", because now you're going to end up with an overlap with "Dinnerware", which is, of course, wrong. So, you have to redefine "Dinnerware", and everything in it, in order to add "Vegetable" to "Food". Let's say you have to do that kind of thing several times per minute. You need to add "Vegetable" and twenty items in it. At the same time, you need to add four new types of "Meat", six new "Fruit", two more types of "Apple" (under "Fruit"). Also, you're changing who you buy your dinnerware from, and they have different product IDs for plates, cups, glasses, silverware, etc., so you basically have to double the size of that category, but "Dinnerware" has a range that ends at 26 and "Furniture" starts at 27. Suddenly, you have to manage updates that affect nearly every row in your table. The complexity of the update is huge.

    There are a number of solutions for this. You could add room to each category you add, based on anticipated growth over a reasonable time span. Instead of defining "Food" as 1-12, you define it as 1-1000, even though you only need 1-12 when you first build it. But does "Fruit" get 2-500, or 2-100, or 2-20, in that scheme? Managing the data ranges becomes something that requires a lot more work.

    A path or adjacency hierarchy handles that kind of situation very, very easily. Want to add a new sub-set to "Food"? Easy, add one row, and its ParentID = "Food". You're done. No other rows to do anything with at all. Now you need to add 100 items under "Dinnerware"? Again, it's a single transaction that simply adds 100 rows to the table, each with the correct ParentID. Milliseconds later, you have your hierarchy fully functional.

    In other words, there's more to it than what your sample code does. A lot more.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • ventsislav.mladenov (1/28/2009)


    Or you could use CLR store procedures which are very fast in execution and sometimes are faster than T-SQL.

    http://blog.vmladenov.com/?p=197%5B/quote%5D

    Have you speed-tested the queries on that against nested sets hierarchies?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Lukk: definitely agree with you. One of my biggest concerns when I left the job where I built this hybrid hierarchy was whether or not anyone left behind would be able to manage it. I documented the heck out of it, asked them if they were comfortable with it, got reassurances that either they'd figure it out or they'd call me (and pay consulting rates) if they needed help. Not a very good sign, I have to say.

    Unfortunately, that company didn't have any 2008 servers, and had only vague plans about maybe obtaining some one day in the indeterminate future, so that wasn't really an option. The stuff in this article is moderately complex, but not horribly so, and doesn't depend on a proprietary data type nor on features that most companies don't have yet.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks Noel and Jeff. I appreciate the critique and compliments. šŸ™‚

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • danielk1 (1/28/2009)


    I just feel, to tie myself to a DB structure type, like you mentioned using the hierarchcical ID, might limit my application's portability - say, if I want to take it from MS SQL server to MySQL or Oracle or DB2.

    Doing a small amount of coding in order to remain with ISO standard fields, is in my opinion, making for a far better application.

    That is a very good point Daniel and I would definitely consider that if I knew there is a chance the app will be ported to other DB servers, otherwise I would definitely go for hierarchyID (if it proves to be fast enough).

    This discussion also applies to app architecture - e.g. should we have a separate DAL in the project if we know that we will only use MS Sql Server?

    Thanks,

    Lukasz

  • ventsislav.mladenov (1/28/2009)


    Or you could use CLR store procedures which are very fast in execution and sometimes are faster than T-SQL.

    http://blog.vmladenov.com/?p=197%5B/quote%5D

    Since you say "Now the fact is that the C# store procedure is faster than T-SQL code. Iā€™m wondering why but this is the result." in you blog, I'd like to see the T-SQL method code and test data you tested against. I'm always willing to learn something new.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 45 total)

You must be logged in to reply to this topic. Login to reply