Everybody Reports to Somebody

  • Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/cHatley/3001.asp

  • Ahhh, ye olde adjacency list model.  You might get better efficiency from a set-based solution, like the nested sets model or one of its variants.  Here's an Q&D sample.

    CREATE TABLE #Employees (ID INT PRIMARY KEY,

     LastName VARCHAR(30),

     FirstName VARCHAR(30),

     LeftID INT,

     RightID INT)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (1, 'Smith', 'John', 1, 24)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (2, 'Johnson', 'Bob', 2, 15)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (4, 'Fields', 'Sue', 3, 4)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (3, 'Burton', 'Debra', 16, 23)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (5, 'Jacobs', 'Todd', 11, 14)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (6, 'Sanders', 'Mary', 5, 10)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (10, 'Sims', 'Tom', 17, 18)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (11, 'Wright', 'Larry', 19, 20)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (12, 'Morgan', 'Betty', 21, 22)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (7, 'Teeter', 'Sandy', 12, 13)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (8, 'Morris', 'Megan', 6, 7)

    INSERT INTO #Employees (ID, LastName, FirstName, LeftID, RightID)

    VALUES (9, 'Binks', 'Randy', 8, 9)

    GO

    CREATE PROCEDURE dbo.GetOrgChart (@startID INT)

    AS

    BEGIN

     SELECT e1.ID, e1.LastName, e1.FirstName, e1.LeftID, e1.RightID, COUNT(e2.ID) AS Level

     FROM #Employees e1

     INNER JOIN #Employees e2

     ON e1.LeftID BETWEEN e2.LeftID AND e2.RightID

     WHERE e1.LeftID BETWEEN (SELECT LeftID FROM #Employees WHERE ID = @startid)

      AND (SELECT RightID FROM #Employees WHERE ID = @startid)

     GROUP BY e1.ID, e1.LastName, e1.FirstName, e1.LeftID, e1.RightID

     ORDER BY e1.LeftID

    END

    GO

    EXEC dbo.GetOrgChart 1

  • Oracle has "Connect by prior"

    select * from person connect by prior id=supervisor_id

    .. it makes traversing hierarchies trivial.

    I was hoping it would appear in SS2005... 

  • Hi Mike,

    apologies for this, but i am interested in your right-brain set based solution, but I am feeling a bit left-brain this Monday morning, probably lack of coffee, so please could you set me right about what left-id and right-id represent because at the moment I'm feeling a bit left-out.

    Thanks

    David

    If it ain't broke, don't fix it...

  • Quite liked that one, though as surprised as you at the efficiency. Will give it a bash sometime.

  • Mike's example is actually a nested set model, not an adjecent list model per the article. Basically instead of having a "boss" column in your employee table that's the same as an employee, each employee has containment information. You use two columns which represent the range of containment, a set, an employee has. Joe Celko explains it better, http://www.developersdex.com/gurus/articles/112.asp.

  • Hi Tim,

    Thanks for that link to Joe Celko's article, I had not come across this solution before. I can see that it is easier to model an org tree in SQL using set theory. In practice, does the data in the set correspond to an organisation's departmental structure, rather than individual relationships, otherwise you could get real problems if an employee switches managers and correspondingly needed to switch containments?

    I like the idea that because the containment encapsulates all levels of the hierarchy, there is no break in the management chain if an employee in the middle-tier leaves, the model is effectively self-healing.

    Presumably by creating overlapping sets you could also create organisation structures where individuals have multiple lines of management (many DBAs may be familiar with this    )

    David

    If it ain't broke, don't fix it...

  • Hi Mike,

       Although your solution is a viable option (and we considered it), we chose to identify each employee's manager as a single field on each record because we wanted the ability to easily update this information to accomodate the incredible growth of our organization.  With the method we are using, we basically complete this single piece of information (manager's user ID) for each new employee during the account setup process and we are good to go. 

     

  • Hi Mike,

    found a further article at http://www.intelligententerprise.com/001020/celko1_1.jhtml

    It appears that if your organisation structure fequently changes, you would still need an adjacency list to calculate the nested sets though? Joe provides the code to convert one to the other. I guess the practical application of this solution  would be to write a trigger which recalculates the nested sets whenever the organisation structure is changed (i.e. you change an employee's manager id), storing the left and right values you calculate and using them as an aid to produce faster organisation structure reports.

    If someone left the organisation, you would leave the existing calculated left and right values in place to provide a management hierarchy until a new manager was appointed.

    PS: Have had several coffees now

    David

    If it ain't broke, don't fix it...

  • Hi,

       I have received a private question asking about how to deal with employees that may report to multiple bosses and I wanted to provide my thoughts on the subject.  I think the best approach to address this issue would be to move the 'Manager User ID' field to a separate table containing at least the 'User ID' and 'Manager User ID' columns which could be joined to the 'Employee' table via the 'User ID' column.  You could then modify the stored procedure to include the new table via a join such as (Changes Highlighted In Yellow) . . .

    CREATE PROCEDURE select_subordinates_by_manager

    @user-id int

    As

    Set NoCount On

    Declare @UserCount int

    Set @UserCount = 1

    Create Table #In (UserID int)

    Create Table #Out (UserID int)

    Create Table #Result (UserID int)

    Insert Into #In (UserID) Values (@UserID)

    Insert Into #Result (UserID) Values (@UserID)

    While @UserCount > 0

    Begin

    Insert Into #Out (UserID)

    Select Distinct UserID From Test..Employee as e

    Join Test..ManagedBy as m on m.UserID = e.UserID

    Where m.ManagerUserID In (Select UserID From #In)

    Select @UserCount = (Select Count(UserID) From #Out)

    If @UserCount > 0

    Begin

    Insert Into #Result (UserID)

    Select UserID From #Out

    End

    Delete From #In

    Insert Into #In

    Select UserID From #Out

    Delete From #Out

    End

    Set NoCount Off

    Select UserID From #Result

  • Hi David and Craig,

    The LeftID and RightID define a subset.  In the case of John, his LeftID=1 and RightID=24.  Anyone with a LeftID and RightID between these two numbers are defined by a subset of John's set.  Bob's has the LeftID=2 and RightID=15, so his numbers are a subset of John's set {1..24} (apologies for the "fake" set notation .)  Therefore, Bob reports to John.  And so on...

    Actually you don't need to use the adjacency list to renumber the left and right ID's of your nested sets.  There are some other options to improve efficiency.  In the example I gave the sets were a "tight" fit, with the left and right id numbers as close together as possible.  However, that's not a requirement.  You can actually spread them out further to start with, so long as they still properly "nest" within one another.  For instance:

    John Smith, LeftID=1, RightID=1000

    Bob Johnson, LeftID=20, RightID=150

    Sue Fields, LeftID=300, RightID=400

    In this case, Bob and Sue are still nested within John Smith, but the range between LeftID and RightID has been widened.  You can easily insert or delete individuals without recalculating (dependent on business rules).  The recalculations come in when Bob's entire range is filled and he gets another subordinate.

    Using this method you don't need to recalculate LeftID and RightID as often, which, as David pointed out can hurt performance on large data sets.  Renumbering can also be done with just a few statements (albeit slightly more complex ones).  This method also has the advantage that you can calculate things like the level of each employee in the hierarchy, total height of the tree, etc., relatively easily.

    You can also convert adjacency list to nested sets, and vice versa.

    Celko's written a lot on the topic of nested sets versus adjacency list.  He also has a pretty good book on the subject - "Trees and Hierarchies in SQL".  Although it's not SQL Server specific, his examples are fairly easy to port over.

    Also, if you're going to use the adjacency list, you might consider using recursive CTEs (SQL 2K5) - they were defined for this type of thing

  • "In practice, does the data in the set correspond to an organisation's departmental structure, rather than individual relationships"

    This is just a different model for storing hierarchical data in SQL tables. The origial article presented an adjacent set model, Mike presented a nested sets model, and there's another I've seen used called the maternalized path model. The maternalized path model is basically the same structure as a URL, folders and paths. In that model you store a node's entire ancestry in a column. To get to your question though, with both the nested sets and maternalized path models the information you use to maintain the tree structure can be, and usually is, entirely fictional with no relationship whatsoever with the data it's associated with. Kind of like inventing an ID column for a table instead of using a person's name as a key for instance.

    I haven't had to model hierarchical data for a project in a while but I used the nested sets model to do it when I did. It really isn't as difficult to use or maintain as some articles I've read about it indicate -- of course I'll suffix that by saying each model has its advantages and disadvantages. Changing someone's boss, for instance, is essentially moving a node within the tree. In the nested sets model this boils down to updating rows that are between numerical ranges and adding or subtracting numbers. With the maternalized path model you update rows that are LIKE a certain string. In either case you could potentially update every row in the table. The adjacent set model wouldn't update as many rows to do this.

  • Here's a short CTE, assuming a table named #Employees with the data shown in the article:

    WITH OrgChart (UserID, LastName, FirstName, ManagerUserID)

    AS

    (

        SELECT UserID, LastName, FirstName, ManagerUserID

        FROM #Employees

        WHERE ManagerUserID = 0

        UNION ALL

        SELECT e.UserID, e.LastName, e.FirstName, e.ManagerUserID

        FROM #Employees e

        INNER JOIN OrgChart o

        ON e.ManagerUserID = o.UserID

    )

    SELECT UserID, LastName, FirstName, ManagerUserID

    FROM OrgChart;

    Please excuse any typos - I had to retype this in here.

  • "I was hoping it would appear in SS2005..."

    SQL Server 2005 introduced Common Table Expressions (CTE) which can be recursive.

  • I like Craig's original approach because performance is good and it is straightforward. Another important consideration is many, if not most organizations that I am aware of tend to follow the 'one-employee-one-manager' model. I utilize essentially the same algorithm with three differences: 1) I encapsulate all of the code within the stored procedure, which simply outputs an ordered four-column dataset (SortName [Last, First], Employee ID, SupervisorID, Level), 2) I include the Level column (integer), which indicates how many levels down from the specified manager each employee is in the hierarchy, and 3) I have an additional input parameter for Level, which allows me to easily return only the direct reports for the manager (or some other number of levels deep that I may be interested in at the moment). Of course, my method makes certain assumptions about how the names will be used since I am controlling the name format (we do a lot of dropdown lists and reports using the output.)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply