Please help with Recursive CTE to get reversed people hierarchy

  • I have never used CTE's before and I can't figure out how to make this work for my need.

    Say I have an employee table with columns EmpName, EmpID and BossID. Each BossID is the EmpID of another record. The table looks like

    Paul, 1, 0 (0 means he has no manager)

    John, 2, 1

    Mary, 3, 2

    Mark, 4, 2

    Mike, 5, 4

    Peter, 6, 0

    Rick, 7, 6

    Mike's manager is Mark and Mark's manager is John.

    Mary's manager is also John and she manages nobody.

    John's manager is Paul who has no manager.

    Now I need a list of all employees with all their managers. My ouput, sorted by EmpID and BossID should be.

    Paul, <nothing>

    John, Paul

    Mary, Paul

    Mary, John

    Mark, Paul

    Mark, John

    Mike, Paul

    Mike, John

    Mike, Mark

    Peter, <nothing>

    Rick, Peter

    I find lots of recursive CTE example showing a hierarchy going from top to bottom, showing all people under employee X, but I can't find or figure out how to go from bottom to top.

    I would really appreciate some help with this, there's 1500 employees and in my table and I need to list all of them with all their managers. I don't want to end up doing this with loops and cursors.

  • This is called an ancestry tree. I have an example of such a query on my blog. You can check it at the following link, just look at the ancestry section.

    http://bit.ly/fkhierarchy

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • This is a bit simpler than Jason's example. Check the way I post the sample data, you're supposed to post it this way to get better help.

    CREATE TABLE employee (

    EmpName varchar(50),

    EmpID int,

    BossID int);

    INSERT INTO employee VALUES

    ('Paul', 1, 0),

    ('John', 2, 1),

    ('Mary', 3, 2),

    ('Mark', 4, 2),

    ('Mike', 5, 4),

    ('Peter', 6, 0),

    ('Rick', 7, 6);

    WITH rCTE AS(

    SELECT EmpName, EmpID, BossID

    FROM employee

    UNION ALL

    SELECT r.EmpName, r.EmpID, e.BossID

    FROM employee e

    JOIN rCTE r ON e.EmpID = r.BossID

    WHERE e.BossID > 0

    )

    SELECT r.EmpName, e.EmpName Boss

    FROM rCTE r

    LEFT

    JOIN employee e ON r.BossID = e.EmpID

    ORDER BY r.EmpID

    GO

    DROP TABLE employee

    You should try the looping option, you might find better performance than the recursive cte. You never know.

    EDIT: Here's an alternative that you can use to test for performance.

    SELECT EmpName, EmpID, BossID

    INTO #employee

    FROM employee;

    WHILE @@ROWCOUNT > 0

    INSERT #employee

    SELECT r.EmpName, r.EmpID, e.BossID

    FROM employee e

    JOIN #employee r ON e.EmpID = r.BossID

    WHERE e.BossID > 0

    AND NOT EXISTS( SELECT 1

    FROM #employee x

    WHERE x.EmpID = r.EmpID

    AND x.BossID = e.BossID);

    SELECT r.EmpName, e.EmpName Boss

    FROM #employee r

    LEFT

    JOIN employee e ON r.BossID = e.EmpID

    ORDER BY r.EmpID;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (11/24/2014)


    This is a bit simpler than Jason's example. Check the way I post the sample data, you're supposed to post it this way to get better help.

    CREATE TABLE employee (

    EmpName varchar(50),

    EmpID int,

    BossID int);

    INSERT INTO employee VALUES

    ('Paul', 1, 0),

    ('John', 2, 1),

    ('Mary', 3, 2),

    ('Mark', 4, 2),

    ('Mike', 5, 4),

    ('Peter', 6, 0),

    ('Rick', 7, 6);

    WITH rCTE AS(

    SELECT EmpName, EmpID, BossID

    FROM employee

    UNION ALL

    SELECT r.EmpName, r.EmpID, e.BossID

    FROM employee e

    JOIN rCTE r ON e.EmpID = r.BossID

    WHERE e.BossID > 0

    )

    SELECT r.EmpName, e.EmpName Boss

    FROM rCTE r

    LEFT

    JOIN employee e ON r.BossID = e.EmpID

    ORDER BY r.EmpID

    GO

    DROP TABLE employee

    You should try the looping option, you might find better performance than the recursive cte. You never know.

    EDIT: Here's an alternative that you can use to test for performance.

    SELECT EmpName, EmpID, BossID

    INTO #employee

    FROM employee;

    WHILE @@ROWCOUNT > 0

    INSERT #employee

    SELECT r.EmpName, r.EmpID, e.BossID

    FROM employee e

    JOIN #employee r ON e.EmpID = r.BossID

    WHERE e.BossID > 0

    AND NOT EXISTS( SELECT 1

    FROM #employee x

    WHERE x.EmpID = r.EmpID

    AND x.BossID = e.BossID);

    SELECT r.EmpName, e.EmpName Boss

    FROM #employee r

    LEFT

    JOIN employee e ON r.BossID = e.EmpID

    ORDER BY r.EmpID;

    There we go just giving away the answer 😉

    While looping works fine with small sets, I have found that recursive CTEs will blow the doors off a loop with larger data sets. Where loops will take 30 minutes, a good recursive cte will take less than a minute.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • A million thanks to both of you guys. I really appreciate it and I do apologize for not properly posting my question 🙂

    I ran both of them on my employee table without the filters I would normally apply so I it was processing 1979 employees and returned 7793 row.

    To be honest, I expected the loop to be slow but there is not much of a difference. They both run so fast I can't even really know what the actual difference is. CTE runs in 00:00:00 and loop runs in 00:00:01.

  • Gagne (11/24/2014)


    A million thanks to both of you guys. I really appreciate it and I do apologize for not properly posting my question 🙂

    I ran both of them on my employee table without the filters I would normally apply so I it was processing 1979 employees and returned 7793 row.

    To be honest, I expected the loop to be slow but there is not much of a difference. They both run so fast I can't even really know what the actual difference is. CTE runs in 00:00:00 and loop runs in 00:00:01.

    You are welcome.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (11/24/2014)


    While looping works fine with small sets, I have found that recursive CTEs will blow the doors off a loop with larger data sets. Where loops will take 30 minutes, a good recursive cte will take less than a minute.

    I've found that is not the size of the set, but the number of levels for the hierarchy. CTEs will run fast with a few levels, but the more levels, the faster the loop will be. Of course, it depends on each data set and its distribution.

    I'm sure that the best option is to test each time.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (11/24/2014)


    SQLRNNR (11/24/2014)


    While looping works fine with small sets, I have found that recursive CTEs will blow the doors off a loop with larger data sets. Where loops will take 30 minutes, a good recursive cte will take less than a minute.

    I've found that is not the size of the set, but the number of levels for the hierarchy. CTEs will run fast with a few levels, but the more levels, the faster the loop will be. Of course, it depends on each data set and its distribution.

    I'm sure that the best option is to test each time.

    My findings have been the exact opposite. The timings I reported are for data sets involving 1000s of levels and millions of people in the hierarchy. I have found the CTE to be consistently faster for hierarchies than looping mechanisms. Though there has been the rare occasion where a loop will be faster time wise but worse for IO. YMMV.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I'm sure you both know much more than me about CTE's but logically the performance results should be similar for everyone so I'm thinking there must be external factors involved like primary keys and/or indexes, performance of the disk(s) where the transaction logs are located and maybe even the version of SQL Server as it's conceivable that CTE has improved overtime.

  • Gagne (11/24/2014)


    I'm sure you both know much more than me about CTE's but logically the performance results should be similar for everyone so I'm thinking there must be external factors involved like primary keys and/or indexes, performance of the disk(s) where the transaction logs are located and maybe even the version of SQL Server as it's conceivable that CTE has improved overtime.

    True. Cost Estimator could play a part in newer versions as well. Generally speaking, I have found that performance to be consistent all the way back to 2005 when CTEs were introduced. The big problem is that many examples of CTEs out there are not good for performance. The same can be said of looping mechanisms. The key is to compare the loop to the CTE method on like hardware for the same data set. :-D:-D

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I just happen to have a beautiful test bed for such testing in my "Hierarchies on Steroids" articles. When I get the chance, I'll convert the rCTE in the first article to a WHILE loop and see what happens.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There are many ways to skin a hierarchical cat:

    https://www.simple-talk.com/sql/performance/the-performance-of-traversing-a-sql-hierarchy-/


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply