Please help with Recursive CTE to get reversed people hierarchy

Question

Please help with Recursive CTE to get reversed people hierarchy

Eric_Gagne

SSCrazy

Points: 2062
More actions
November 24, 2014 at 2:31 pm

#148628

I have never used CTE's before and I can't figure out how to make this work for my need.
Say I have an employee table with columns EmpName, EmpID and BossID. Each BossID is the EmpID of another record. The table looks like
Paul, 1, 0 (0 means he has no manager)
John, 2, 1
Mary, 3, 2
Mark, 4, 2
Mike, 5, 4
Peter, 6, 0
Rick, 7, 6
Mike's manager is Mark and Mark's manager is John.
Mary's manager is also John and she manages nobody.
John's manager is Paul who has no manager.
Now I need a list of all employees with all their managers. My ouput, sorted by EmpID and BossID should be.
Paul, <nothing>
John, Paul
Mary, Paul
Mary, John
Mark, Paul
Mark, John
Mike, Paul
Mike, John
Mike, Mark
Peter, <nothing>
Rick, Peter
I find lots of recursive CTE example showing a hierarchy going from top to bottom, showing all people under employee X, but I can't find or figure out how to go from bottom to top.
I would really appreciate some help with this, there's 1500 employees and in my table and I need to list all of them with all their managers. I don't want to end up doing this with loops and cursors.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

SQLRNNR SSC Guru Points: 281334 More actions · Answer 1

This is called an ancestry tree. I have an example of such a query on my blog. You can check it at the following link, just look at the ancestry section.

http://bit.ly/fkhierarchy

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Luis Cazares SSC Guru Points: 183698 More actions · Answer 2

This is a bit simpler than Jason's example. Check the way I post the sample data, you're supposed to post it this way to get better help.

CREATE TABLE employee (

EmpName varchar(50),

EmpID int,

BossID int);

INSERT INTO employee VALUES

('Paul', 1, 0),

('John', 2, 1),

('Mary', 3, 2),

('Mark', 4, 2),

('Mike', 5, 4),

('Peter', 6, 0),

('Rick', 7, 6);

WITH rCTE AS(

SELECT EmpName, EmpID, BossID

FROM employee

UNION ALL

SELECT r.EmpName, r.EmpID, e.BossID

FROM employee e

JOIN rCTE r ON e.EmpID = r.BossID

WHERE e.BossID > 0

)

SELECT r.EmpName, e.EmpName Boss

FROM rCTE r

LEFT

JOIN employee e ON r.BossID = e.EmpID

ORDER BY r.EmpID

GO

DROP TABLE employee

You should try the looping option, you might find better performance than the recursive cte. You never know.

EDIT: Here's an alternative that you can use to test for performance.

SELECT EmpName, EmpID, BossID

INTO #employee

FROM employee;

WHILE @@ROWCOUNT > 0

INSERT #employee

SELECT r.EmpName, r.EmpID, e.BossID

FROM employee e

JOIN #employee r ON e.EmpID = r.BossID

WHERE e.BossID > 0

AND NOT EXISTS( SELECT 1

FROM #employee x

WHERE x.EmpID = r.EmpID

AND x.BossID = e.BossID);

SELECT r.EmpName, e.EmpName Boss

FROM #employee r

LEFT

JOIN employee e ON r.BossID = e.EmpID

ORDER BY r.EmpID;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

SQLRNNR SSC Guru Points: 281334 More actions · Answer 3

Luis Cazares (11/24/2014)
This is a bit simpler than Jason's example. Check the way I post the sample data, you're supposed to post it this way to get better help.
CREATE TABLE employee (
EmpName varchar(50),
EmpID int,
BossID int);
INSERT INTO employee VALUES
('Paul', 1, 0),
('John', 2, 1),
('Mary', 3, 2),
('Mark', 4, 2),
('Mike', 5, 4),
('Peter', 6, 0),
('Rick', 7, 6);
WITH rCTE AS(
SELECT EmpName, EmpID, BossID
FROM employee
UNION ALL
SELECT r.EmpName, r.EmpID, e.BossID
FROM employee e
JOIN rCTE r ON e.EmpID = r.BossID
WHERE e.BossID > 0
)
SELECT r.EmpName, e.EmpName Boss
FROM rCTE r
LEFT
JOIN employee e ON r.BossID = e.EmpID
ORDER BY r.EmpID
GO
DROP TABLE employee
You should try the looping option, you might find better performance than the recursive cte. You never know.
EDIT: Here's an alternative that you can use to test for performance.
SELECT EmpName, EmpID, BossID
INTO #employee
FROM employee;
WHILE @@ROWCOUNT > 0
INSERT #employee
SELECT r.EmpName, r.EmpID, e.BossID
FROM employee e
JOIN #employee r ON e.EmpID = r.BossID
WHERE e.BossID > 0
AND NOT EXISTS( SELECT 1
FROM #employee x
WHERE x.EmpID = r.EmpID
AND x.BossID = e.BossID);
SELECT r.EmpName, e.EmpName Boss
FROM #employee r
LEFT
JOIN employee e ON r.BossID = e.EmpID
ORDER BY r.EmpID;

There we go just giving away the answer 😉

While looping works fine with small sets, I have found that recursive CTEs will blow the doors off a loop with larger data sets. Where loops will take 30 minutes, a good recursive cte will take less than a minute.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Eric_Gagne SSCrazy Points: 2062 More actions · Answer 4

A million thanks to both of you guys. I really appreciate it and I do apologize for not properly posting my question 🙂

I ran both of them on my employee table without the filters I would normally apply so I it was processing 1979 employees and returned 7793 row.

To be honest, I expected the loop to be slow but there is not much of a difference. They both run so fast I can't even really know what the actual difference is. CTE runs in 00:00:00 and loop runs in 00:00:01.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 5

Gagne (11/24/2014)
A million thanks to both of you guys. I really appreciate it and I do apologize for not properly posting my question 🙂
I ran both of them on my employee table without the filters I would normally apply so I it was processing 1979 employees and returned 7793 row.
To be honest, I expected the loop to be slow but there is not much of a difference. They both run so fast I can't even really know what the actual difference is. CTE runs in 00:00:00 and loop runs in 00:00:01.

You are welcome.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Luis Cazares SSC Guru Points: 183698 More actions · Answer 6

SQLRNNR (11/24/2014)
While looping works fine with small sets, I have found that recursive CTEs will blow the doors off a loop with larger data sets. Where loops will take 30 minutes, a good recursive cte will take less than a minute.

I've found that is not the size of the set, but the number of levels for the hierarchy. CTEs will run fast with a few levels, but the more levels, the faster the loop will be. Of course, it depends on each data set and its distribution.

I'm sure that the best option is to test each time.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

SQLRNNR SSC Guru Points: 281334 More actions · Answer 7

Luis Cazares (11/24/2014)
SQLRNNR (11/24/2014)
While looping works fine with small sets, I have found that recursive CTEs will blow the doors off a loop with larger data sets. Where loops will take 30 minutes, a good recursive cte will take less than a minute.
I've found that is not the size of the set, but the number of levels for the hierarchy. CTEs will run fast with a few levels, but the more levels, the faster the loop will be. Of course, it depends on each data set and its distribution.
I'm sure that the best option is to test each time.

My findings have been the exact opposite. The timings I reported are for data sets involving 1000s of levels and millions of people in the hierarchy. I have found the CTE to be consistently faster for hierarchies than looping mechanisms. Though there has been the rare occasion where a loop will be faster time wise but worse for IO. YMMV.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Eric_Gagne SSCrazy Points: 2062 More actions · Answer 8

I'm sure you both know much more than me about CTE's but logically the performance results should be similar for everyone so I'm thinking there must be external factors involved like primary keys and/or indexes, performance of the disk(s) where the transaction logs are located and maybe even the version of SQL Server as it's conceivable that CTE has improved overtime.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 9

Gagne (11/24/2014)
I'm sure you both know much more than me about CTE's but logically the performance results should be similar for everyone so I'm thinking there must be external factors involved like primary keys and/or indexes, performance of the disk(s) where the transaction logs are located and maybe even the version of SQL Server as it's conceivable that CTE has improved overtime.

True. Cost Estimator could play a part in newer versions as well. Generally speaking, I have found that performance to be consistent all the way back to 2005 when CTEs were introduced. The big problem is that many examples of CTEs out there are not good for performance. The same can be said of looping mechanisms. The key is to compare the loop to the CTE method on like hardware for the same data set. :-D:-D

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

I just happen to have a beautiful test bed for such testing in my "Hierarchies on Steroids" articles. When I get the chance, I'll convert the rCTE in the first article to a WHILE loop and see what happens.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86908 More actions · Answer 11

There are many ways to skin a hierarchical cat:

https://www.simple-talk.com/sql/performance/the-performance-of-traversing-a-sql-hierarchy-/

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St