Hierarchical Table Design Question

  • I need some help in designing the table that can handle unbalanced/ragged hierarchy. What should be the table structure of below scenario? Below scenario entails that Bill is at Level 1 of hierarchy, Allen is at Level 2 of hierarchy, James is at Level 3 of hierarchy and Wendy is at Level 3. The only twist is that Wendy is directly reporting to Level 1 Bill. In other words, how can I design a table and write SQL that dynamically create Levels? Please keep in mind that in below case I want to Wendy to be at Level 3 with no reporting of Level 2.

    Emp_ID, EmpName, Parent_ID

    001, Bill, Null

    002, Allen, 001

    003, James, 002

    004, Wendy,001

    FYI.... I would like to know the solution that can work in SQL Server 2000.

    Thanks!

  • ali.prasla (10/29/2009)


    I need some help in designing the table that can handle unbalanced/ragged hierarchy. What should be the table structure of below scenario? Below scenario entails that Bill is at Level 1 of hierarchy, Allen is at Level 2 of hierarchy, James is at Level 3 of hierarchy and Wendy is at Level 3. The only twist is that Wendy is directly reporting to Level 1 Bill. In other words, how can I design a table and write SQL that dynamically create Levels? Please keep in mind that in below case I want to Wendy to be at Level 3 with no reporting of Level 2.

    Emp_ID, EmpName, Parent_ID

    001, Bill, Null

    002, Allen, 001

    003, James, 002

    004, Wendy,001

    FYI.... I would like to know the solution that can work in SQL Server 2000.

    Thanks!

    Based on the data above, how can Wendy be at level 3? With a Parent_ID of 001, she is obviously at level 2.

  • Wendy is at Level 2 based on the scenario, but Wendy in this case is an administrative assistant for Bill (CEO). For this reason, she is on Level 3. How can I design the table or write a query that allows me build these kinds of level that are unbalanced?

  • Based on the given data there is no information that Wendy should be at level 3. Why not level 4 or 10?

    You probably need an additional column indicating the level the Emp is at. In order to avoid inconstant data when adding the "real level" directly, I would use an offset column.

    Next step would be to calculate the "standard level" and to add the offset to it. To handle several offsets down the hierarchy would take some additional effort but it should be doable, I think...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thank your for your feedback!

    Based on your solution, what should be the structure of the table? Also, feel free to provide SQL or table logic to do offset.

    Thanks!

  • Table structure:

    Emp_ID, EmpName, Parent_ID, Offset

    001, Bill, Null,0

    002, Allen, 001,0

    003, James, 002,0

    004, Wendy,001,1

    SQL logic:

    How about splitting the effort?

    Your part: providing query to do the standard hierarchy

    Our part: we'll help you to expand your logic to cover offsets

    If you don't have any logic for getting the hierarchy yet I'd like to recommend to search on this site. There are quite a few samples.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • First, how frequently does this data change? Multiple times per second? Every day? Every week? Every few months? Once a year or less?

    If the answer is daily or beyond, then I recommend moving to a nested sets hierarchy instead of an adjacency hierarchy.

    The structure would look like this:

    create table #Hierarchy (

    RangeStart int not null,

    RangeEnd int not null,

    constraint CK_Range check (RangeStart <= RangeEnd),

    constraint PK_Hierarchy primary key (RangeStart, RangeEnd),

    EmployeeID int,

    Lvl int not null);

    insert into #Hierarchy (RangeStart, RangeEnd, EmployeeID, Lvl)

    select 1, 100, 1, 1 -- Bill (name normalized into Employees table)

    union all

    select 2, 2, 4, 3 -- Wendy

    union all

    select 3, 50, 2, 2 -- Allen

    union all

    select 4, 4, 3, 3; -- James

    You'll note that I removed the employee name column, since that really should be in the Employees table, for third normal form.

    There's an excellent article on nested sets hierarchies at http://www.intelligententerprise.com/001020/celko.jhtml, written by Joe Celko, whom I believe originated the idea.

    The basic idea is that, with a structure like this, query performance on the hierarchy goes WAY up. It can literally be thousands of times faster.

    If we add an index to EmployeeID, you can query like this:

    create index IDX_Hierarchy_EmployeeID on #Hierarchy (EmployeeID);

    select H2.EmployeeID

    from #Hierarchy H1

    inner join #Hierarchy H2

    on H1.RangeStart <= H2.RangeStart

    and H1.RangeEnd >= H2.RangeEnd

    where H1.EmployeeID = 2;

    That will find all of Alan's juniors. It will do it with two index seeks, one for the EmployeeID, one on the clustered index for the range data.

    No matter how many levels you add to that kind of hierarchy, it resolves the second set of data with a single pass, once it has the outer bounds of the range from a single index seek. With an adjacency hierarchy (one where you store IDs and parent IDs), you have one scan per level, minimum. That's a huge improvement in I/O and index use, and that's where the huge speed bonus comes from.

    These are more difficult to update, but only slightly so, and the speed on selects is so much faster that it's usually worth it.

    The only time to avoid these is when slow updates will matter a lot. If, for example, you have a hierarchy that updates several times per second, then nested sets isn't the way to go.

    I came up with a hybrid version (and published an article on this site about it) that has most of the advantages of each. But even that is only worth it in limited circumstances, otherwise go with nested sets.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Gus, thank you for the link to Joe's article!

    I don't use hierarchies that often so I used to do it the "old fashioned way"...

    The article really does show an alternative!



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thank you for the article and design for the table. Instead of ranges (rgt or lft), can you incorporate the logic the old fashion way?

    On the other note, how do I populate rgt and lft (ranges) dynamically?

  • The logic the old fashioned way is how you're currently doing it, with an ID and a ParentID. It's slower and requires more I/O proportionate to the number of nodes and levels in the hierarchy.

    Here's the article I wrote on the subject. It has data on both methods, and the hybrid that I came up with, including performance comparisons, and auto-hierarchy code.

    Article: http://www.sqlservercentral.com/articles/T-SQL/65540/

    Joe's article tells how he builds update commands in a nested sets hierarchy.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks! Your article addresses all of my questions. You guys are too good!

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply