Hierarchies in SQL

  • autoexcrement (12/10/2010)


    Fascinating article! I have a couple of naive questions about the following bit of G code:

    Adjacency (NodeID, ParentID) as -- Adjacency Query

    (select 0, ID, ParentID

    from dbo.HierarchyHybrid

    where ID = @NodeID_in

    andexists

    (select*

    from dbo.HierarchyHybrid h2

    where h2.TopParentID = HierarchyHybrid.TopParentID

    and RangeStart is null)

    1. Is it supposed to have the "select 0, " or is that just a typo? It looks like the surrounding code is expecting 2 values there, not 3. If not a typo, I don't understand the "0" part.

    2. Why doesn't this work?

    Adjacency (NodeID, ParentID) as -- Adjacency Query

    (select 0, ID, ParentID

    from dbo.HierarchyHybrid

    where ID = @NodeID_in

    and RangeStart is null)

    Thanks, regardless of whether you have time to reply.

    The 0 is meant to indicate the row that was from the original parameter value. This comes in handy if you are querying up and down the hierarchy and need to know which one you originally asked for.

    The "and exists" subquery merely tests that the row requested is in a hierarchy that has a top level. The version you wrote tests that it IS the top level. There's a difference.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Zeev Kazhdan (12/10/2010)


    Interesting article, but I have a question, which hopefully will save me a lot of hours:

    In Oracle 10 I do it in one select statement, using START WITH and CONNECT BY.

    Now, migrating one of my customers to SQL 2008, is there anything close to it in SQL Server?

    Thanks in advance

    The HierarchyID data type has a lot of that kind of functionality. I built this solution in SQL 2000, and then in SQL 2005, where that wasn't available.

    I've found that nested sets hierarchies still perform much better (in selecting) than the hierarchy ID data type does. Updates are still fastest in adjacency hierarchies, since they usually only involve one row. Deletion speed is fastest in nested sets, second fastest in adjacency, and slowest in hierarchy ID. Additions are fastest in either adjacency, or in a padded nested sets hierarchy (equally fast in either), except in cases where the nested sets version requires resizing or moving any ranges, and are slowest in a hierarchy ID, except when adding one row to the bottom of a chain.

    So, even in SQL 2008, consider a few things before just going with the hierarchy ID data type, instead of nested sets (which is best if you can get past the issues with frequent changes), or a hybrid system like this. Adjacency should only be used if you absolutely have to. It's a performance killer except on very small data sets or very shallow hierarchies. (It's hard to argue against it on a family tree, for example, while you're building it. But for just about anything else I've run into, one of the others will be better.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thank you for the article! We are getting started on a project requiring us to maintain hierarchies and I found the information here very useful. What's missing though is the ability to track hierarchies and their changes over time. What if I need to know Parent-Child relationships as they were at any point in time? Any recommendations?

    Thank you!

  • What if I need to know Parent-Child relationships as they were at any point in time?

    Could you just use change data capture, or any traditional type of trigger-based auditing to track changes to your tables as needed?


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • mishaluba (12/11/2010)


    Thank you for the article! We are getting started on a project requiring us to maintain hierarchies and I found the information here very useful. What's missing though is the ability to track hierarchies and their changes over time. What if I need to know Parent-Child relationships as they were at any point in time? Any recommendations?

    Thank you!

    My other main article for SQL Server Central is on audit trails and logging. It's at http://www.sqlservercentral.com/articles/Auditing/63247/, with part 2 at http://www.sqlservercentral.com/articles/Auditing/63248/.

    autoexcrement (12/11/2010)


    Could you just use change data capture, or any traditional type of trigger-based auditing to track changes to your tables as needed?

    Keep in mind that, with change-data-capture, certain types of table DDL will be blocked, in the same manner as when you have replication on a table. Triggers or passive logging don't have that issue, if they're set up correctly.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thank you for the responses, but I was thinking more along the lines of slowly-chaning dimension, rather then audit/logging. In one of the earlier comments Jeff Storm showed a portion of his solution, which is closer to what I am after. I would like for the application to be able to "look" at the same data using different hierarchy structures (Hierarchy A, which existed at time X and Hierarchy B, which existed at time Y) and all of them have to be equally accessible in the same set of tables.

    Thank you!

  • I'm the first to admit that I'm naive and uninformed, but it sounds to me as if it's basically an issue of you:

    1) deciding what data you need to store, including any "related" data you may need

    2) deciding what DML events should trigger your storing the old/previous values

    From there it's fairly straightforward to decide on a method to copy the old data into a separate table/column/etc. No?

    I'm probably missing something.


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • mishaluba (12/14/2010)


    Thank you for the responses, but I was thinking more along the lines of slowly-chaning dimension, rather then audit/logging. In one of the earlier comments Jeff Storm showed a portion of his solution, which is closer to what I am after. I would like for the application to be able to "look" at the same data using different hierarchy structures (Hierarchy A, which existed at time X and Hierarchy B, which existed at time Y) and all of them have to be equally accessible in the same set of tables.

    Thank you!

    The usual solution for that is adding effective dates to the table, and using those in your queries.

    That often has a significant impact on performance, and always makes coding a bit more complex, but it does work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • mishaluba (12/14/2010)


    Thank you for the responses, but I was thinking more along the lines of slowly-chaning dimension, rather then audit/logging. In one of the earlier comments Jeff Storm showed a portion of his solution, which is closer to what I am after. I would like for the application to be able to "look" at the same data using different hierarchy structures (Hierarchy A, which existed at time X and Hierarchy B, which existed at time Y) and all of them have to be equally accessible in the same set of tables.

    Thank you!

    I actually just got done building this exact solution yesterday. I have a parent child relationship in our customer (accounts) table, and it's using SCD. I wish I had time to explain how I solved this. ...

    It's going to have to wait until I can make a blog on it. Maybe I can do that this weekend. Right now I don't have time to explain it. 🙁

    I'll try and post back here if I can blog on it this weekend...

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • From the article...


    The speed difference and IO difference can be significant on the two. For example, I have a hierarchy in one of my databases with 2,700 nodes in it, going up to six levels deep. If someone at the top of that hierarchy signs in, it takes 11 seconds for my server to resolve the whole hierarchy and determine what data that person has access to (this is a security access hierarchy that controls much of what is displayed to customers on a web site). That’s using the adjacency model. Using a nested sets table, that same hierarchy takes less than 1 millisecond. (This isn’t the one with multiple parents. Different company.)

    If this same database didn’t have a lot of updates to the hierarchies, I’d definitely use a straight-up nested sets hierarchy, and have much faster web pages. But it does have a lot of updates, sometimes several per minute, sometimes several at the same time. [font="Arial Black"]Each nested sets rebuild takes about 20-30 seconds to finish[/font], and locks the table pretty aggressively while it’s running.

    Hi Gus,

    I've had reason to revisit this article and the paragraphs above caught my eye, especially the two emphasized areas.

    With that in mind, which method are you using to convert the Adjacency List to a Nested Set Model? Is it the familiar "push stack" method from Joe Celko's book/article postings? And was it for the ~2700 rows you mentioned?

    Thanks, Gus.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't have access to that code currently, since it's for a prior employer. However, if I remember it correctly, it ended up using the TopParentID mentioned in the article to first find a simple count by that column, did a "running total" type calculation on that (I use a CLR function for that, blindingly fast) to get top level range start and stop values. All of that was very, very fast, like milliseconds. I think it then repeated that for each level till it got zero for @@rowcount, but the lower levels weren't as fast because they had to actually crawl the hierarchy to get the number of nodes beneath each, instead of just a count on TopParentID.

    I tried the update method mentioned in Joe's article and found that it was WAY too slow for an in-use database. My update solution was more complex, but much, much faster.

    I could improve the process immensely with the difference between what I know now and what I knew when I built it, but that's pretty much true of any code I wrote more than about a month ago. Just some simple Cross Apply inline queries would make the thing much more efficient.

    The 2,700 rows was for one hierarchy within a multi-million row table. 11 seconds to resolve anything on a 2,700-row table would imply that I was running it on, maybe a 286 CPU with 2 Meg of RAM? TSR-80? Timex/Sinclair 1000? Not sure how far back I'd have to go to get that bad of performance, even on an adjacency crawl. If I remember correctly, the table had somewhere around 2-million total rows, and 2,700 nodes, 6 levels deep, was the biggest single hierarchy within it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks Gus.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just wanted to post back to this thread that I finished my blog post on handling hierarchies in slowly changing dimensions. If I've misquoted anyone (especially Gus) please let me know. Also any feedback if someone knows of a better / faster way.

    SCD with hierarchy in key[/url]

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • amenjonathan (1/28/2011)


    Just wanted to post back to this thread that I finished my blog post on handling hierarchies in slowly changing dimensions. If I've misquoted anyone (especially Gus) please let me know. Also any feedback if someone knows of a better / faster way.

    SCD with hierarchy in key[/url]

    I read the post, and it looks like it makes sense and would work. Definitely well-written too.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks, Gus.

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

Viewing 15 posts - 31 through 45 (of 45 total)

You must be logged in to reply to this topic. Login to reply