Selecting from hierarchies like Managers and Employees

  • tmarkham1 (5/5/2012)


    rCTE is the most efficient solution

    You're the fourth person to say that with no proof! [font="Arial Black"]PROVE IT![/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • W. Kevin Hazzard (5/6/2012)


    I think the recursive approach is better because it's in keeping with the set-based mindset of T-SQL.

    Now that's a part of the larger problem, Kevin. People think that just because something was written in a single query, that it's set based and nothing could be further from the truth. Although there are some applications where recursion is better than procedural iteration, they are much less common than most people think. As with all else, "It Depends".

    An example of where recursion really stinks up the room is when it is used to count. Here's a whole article on the subject of using rCTEs for counting. Unlike the claims made so far on this thread, the following article has code that you can perform your own testing with. 😉

    http://www.sqlservercentral.com/articles/T-SQL/74118/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff,

    Are you Tom Kyte (one of Oracle's employees) in disguise???? I used to work for a company that consumed Oracle data and so I searched for Oracle specific solutions. One of his biggest mantras was prove it, until I see the proof I don't believe it. I've tried to follow that idealism, but it's good to see it elsewhere.

    BTW: Oracle have a non ANSI clause CONNECT BY for hierarchical queries. One of the frequent queries to Tom was how do I do this without using CONNECT BY, because SQL Server doesn't support it.

    Dave

  • OK Jeff, I took the challenge. I loaded 1,000,000 employees into the table with your script. Then I saved Chuck's procedures using his original names and my own (with a _Recursive suffix). I modified all 4 procedures to use the names in your version of the Employee table. Here are the results of traversing down the tree from [EmployeeID] one (1) to various levels:

    [font="Tahoma"]Iterative Down Approach[/font]

    Levels Returned Milliseconds Rows/ms

    3 385 473 0.81

    4 1497 916 1.63

    5 4732 3116 1.52

    6 12777 23313 0.55

    7 29622 139180 0.21

    [font="Tahoma"]Recursive Down Approach[/font]

    Levels Returned Milliseconds Rows/ms

    3 385 333 1.16

    4 1497 330 4.54

    5 4732 523 9.05

    6 12777 1606 7.96

    7 29622 2916 10.16

    I ran the tests in reverse order after having stopped and restarted the database thinking that the recursive versions may be taking advantage of some caching but the results were the same every time. Traversing down the tree is very slow using the iterative approach. In fact, if you start high enough in the hierarchy and allow the iterations to go deep enough, Chuck's select down procedure never returns on my system. Well, I waited about 30 minutes and killed the query because I got tired of waiting.

    Going up the tree, both the iterative approach and the recursive approaches are very fast. No matter where you start, iterating or recursing up takes about 3 milliseconds from anywhere in the tree.

    I did make one change to my down traversal code that you should know about. Since the downward recursion builds the entire tree from the starting point before selecting out the required levels, it can take a long time if you start high in the tree, say at [EmployeeID] one (1). So, I added the criteria:

    AND [w].[SupLevel] < @maxSupLevel

    to the JOIN clause that is UNIONed into the rCTE. That fixed it by stopping the recursion from going farther than it has to. That's handy. And of course you can remove the WHERE clause on the final select in this case since the result set from the rCTE will have no more rows in it than requested. Here's some test code that you can run for yourself:

    -- Run these blocks individually to avoid variable name clashes

    DECLARE @before DATETIME = GETDATE();

    EXEC [dbo].[usp_Employee_Select_Down_Param_EmpID] @selectEmpID = 1, @maxSupLevel = 7;

    DECLARE @after DATETIME = GETDATE();

    DECLARE @duration INT = DATEDIFF(ms, @before, @after);

    PRINT N'Select down iterative executed in ' + CONVERT(NVARCHAR(10),@duration) + N' milliseconds.';

    DECLARE @before DATETIME = GETDATE();

    EXEC [dbo].[usp_Employee_Select_Down_Param_EmpID_Recursive] @selectEmpID = 1, @maxSupLevel = 7;

    DECLARE @after DATETIME = GETDATE();

    DECLARE @duration INT = DATEDIFF(ms, @before, @after);

    PRINT N'Select down recursive in ' + CONVERT(NVARCHAR(10),@duration) + N' milliseconds.';

    DECLARE @before DATETIME = GETDATE();

    EXEC [dbo].[usp_Employee_Select_Up_Param_EmpID] @selectEmpID = 1000000, @maxSupLevel = 20;

    DECLARE @after DATETIME = GETDATE();

    DECLARE @duration INT = DATEDIFF(ms, @before, @after);

    PRINT N'Select up iterative executed in ' + CONVERT(NVARCHAR(10),@duration) + N' milliseconds.';

    DECLARE @before DATETIME = GETDATE();

    EXEC [dbo].[usp_Employee_Select_Up_Param_EmpID_Recursive] @selectEmpID = 1000000, @maxSupLevel = 20;

    DECLARE @after DATETIME = GETDATE();

    DECLARE @duration INT = DATEDIFF(ms, @before, @after);

    PRINT N'Select up recursive in ' + CONVERT(NVARCHAR(10),@duration) + N' milliseconds.';

  • Agreed, Jeff, about the mindset. Simpler syntax does not equal better performance. But in this case, I believe that I'm on the right track. Look at the test results I posted on this thread. It's clear that the rCTE is providing better performance as you throw more data at it while the iterative approach suffers. That's a pattern that I've come to appreciate in my 25 years in this business. Smells like imperative versus functional. Or Row-By-Agonizing-Row versus set-based, if you will. 🙂

  • Jeff Moden (5/6/2012)You're the fourth person to say that with no proof! [font="Arial Black"]PROVE IT![/font]

    Jeff:

    This may come as no surprise to you, but I'm not going to take the time to prove anything to you or to anyone else. My time is simply too precious to me to spend it proving something that, in the grand scheme of things, matters very litte to me. Besides, some individuals are so closed minded that they would probably dispute my findings if they were to fly in the face of their beliefs anyhow. So, to that end, I have to say, "what's the point?"

    I just offered my expert opinion that a recursive CTE is likely the most efficient solution to the problem. Like it or not... agree with it or not... it's just an opinion.

    If anyone wants / needs proof and has the free time to do the research, I promise to have an open mind on the results. Heck, I may even apologize if I'm wrong. 😀

  • tmarkham1 (5/6/2012)


    Jeff Moden (5/6/2012)You're the fourth person to say that with no proof! [font="Arial Black"]PROVE IT![/font]

    Jeff:

    This may come as no surprise to you, but I'm not going to take the time to prove anything to you or to anyone else. My time is simply too precious to me to spend it proving something that, in the grand scheme of things, matters very litte to me. Besides, some individuals are so closed minded that they would probably dispute my findings if they were to fly in the face of their beliefs anyhow. So, to that end, I have to say, "what's the point?"

    I just offered my expert opinion that a recursive CTE is likely the most efficient solution to the problem. Like it or not... agree with it or not... it's just an opinion.

    If anyone wants / needs proof and has the free time to do the research, I promise to have an open mind on the results. Heck, I may even apologize if I'm wrong. 😀

    And yet you'll take the time to tell me your time is too precious. 😉 And I bloody well hope you weren't calling me "closed minded" after your refusal to provide any proof. Further, without some form of tangible proof, expert opinions are still just that... opinions.

    I'll be back soon... with proof. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff, Kevin, tmarkham et. al.,

    Nice job! I'm truly impressed.

    I'm a long time believer in the value of peer review. Just look at what came of this discussion.

    Chuck Hoffman
    ------------
    I'm not sure why we're here, but I am sure that while
    we're here we're supposed to help each other
    ------------

  • Wasn't calling you closed mind... I said, "some people". I had 5 seconds to type a quick response, just don't have hours to devote to research. You obviously like to argue or debate, which is fine. Just relax though... there are plenty of decafinated brands that taste just as good as the real thing.

  • tmarkham1 (5/6/2012)


    Wasn't calling you closed mind... I said, "some people". I had 5 seconds to type a quick response, just don't have hours to devote to research. You obviously like to argue or debate, which is fine. Just relax though... there are plenty of decafinated brands that taste just as good as the real thing.

    Heh... no... I don't like to argue or debate, either. That's why I insist that folks that make unqualified statements like you do put up some code to prove it all. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • W. Kevin Hazzard (5/6/2012)


    OK Jeff, I took the challenge. I loaded 1,000,000 employees into the table with your script. Then I saved Chuck's procedures using his original names and my own (with a _Recursive suffix). I modified all 4 procedures to use the names in your version of the Employee table. Here are the results of traversing down the tree from [EmployeeID] one (1) to various levels:

    Nicely done, Kevin. You're a good man. Thanks for rising to the occasion. I'd have liked it better if you had reposted all of the code together but I believe I've found it all. Thank you for your time.

    I'll be back soon with some testing of my own.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Dave Brooking (5/6/2012)


    Jeff,

    Are you Tom Kyte (one of Oracle's employees) in disguise???? I used to work for a company that consumed Oracle data and so I searched for Oracle specific solutions. One of his biggest mantras was prove it, until I see the proof I don't believe it. I've tried to follow that idealism, but it's good to see it elsewhere.

    BTW: Oracle have a non ANSI clause CONNECT BY for hierarchical queries. One of the frequent queries to Tom was how do I do this without using CONNECT BY, because SQL Server doesn't support it.

    Dave

    Heh... no... not Tom Kyte in disuguise. :blush: I worked with Oracle for 3 years and became very familiar with Tom's apparent mantra. I salute him for it because I know it's a tough position to take most of the time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Here's a complete test suite, Jeff. Steps 1 and 2 only need to be performed once. Step 3 you can tweak and run over and over to suit your needs. -- Kevin

    --**************************************************************

    --**************************************************************

    --*** ***

    --*** STEP 1 - Create the Employee table and fill it. ***

    --*** ***

    --**************************************************************

    --**************************************************************

    RAISERROR(N'Creating the Employee table and populating it...', 0, 0) WITH NOWAIT;

    DECLARE @pRowsToBuild INT = 1000000;

    IF OBJECT_ID(N'dbo.Employee','U') IS NOT NULL

    DROP TABLE [dbo].[Employee];

    SELECT TOP (@pRowsToBuild)

    ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) AS [EmployeeID],

    CAST(0 AS INT) AS [ManagerID],

    CAST(NEWID() AS VARCHAR(36)) AS [EmployeeName]

    INTO [dbo].[Employee]

    FROM master.sys.[All_Columns] AS [ac1]

    CROSS JOIN master.sys.[All_Columns] AS [ac2];

    UPDATE [dbo].[Employee] SET [ManagerID] = CASE WHEN [EmployeeID] > 1 THEN ABS(CHECKSUM(NEWID())) % ([EmployeeID] - 1) + 1 ELSE NULL END;

    ALTER TABLE [dbo].[Employee] ADD CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED ([EmployeeID]);

    CREATE INDEX [IX_Employee_ManagerID] ON [dbo].[Employee] ([ManagerID]);

    GO

    --**************************************************************

    --**************************************************************

    --*** ***

    --*** STEP 2 - Create the stored procedures for traversal. ***

    --*** ***

    --**************************************************************

    --**************************************************************

    RAISERROR(N'Creating the stored procedures for traversal...', 0, 0) WITH NOWAIT;

    IF OBJECT_ID(N'dbo.usp_Employee_Select_Down_Param_EmpID_Recursive','P') IS NOT NULL

    DROP PROCEDURE [dbo].[usp_Employee_Select_Down_Param_EmpID_Recursive];

    GO

    CREATE PROCEDURE [dbo].[usp_Employee_Select_Down_Param_EmpID_Recursive]

    @selectEmpID INT = NULL, -- NULL = all

    @maxSupLevel INT = 10 -- NULL = no limit

    AS

    SET @selectEmpID = ISNULL(@selectEmpID, 0);

    WITH [workTbl]

    AS

    (

    SELECT

    [e1].[EmployeeID]

    , [e1].[EmployeeName]

    , [e1].[ManagerID]

    , 0 AS [SupLevel]

    FROM [dbo].[Employee] AS [e1]

    WHERE

    [e1].[EmployeeID] = @selectEmpID

    UNION ALL

    SELECT

    [e2].[EmployeeID]

    , [e2].[EmployeeName]

    , [e2].[ManagerID]

    , [w].[SupLevel] + 1 AS [SupLevel]

    FROM [workTbl] AS [w]

    JOIN [dbo].[Employee] AS [e2] ON

    [e2].[ManagerID] = [w].[EmployeeID]

    AND [w].[SupLevel] < @maxSupLevel

    )

    SELECT

    [EmployeeID]

    , [EmployeeName]

    , [ManagerID]

    , [SupLevel]

    FROM [workTbl]

    ORDER BY

    [SupLevel]

    , [ManagerID]

    , [EmployeeID]

    OPTION (MAXDOP 32);

    GO

    IF OBJECT_ID(N'dbo.usp_Employee_Select_Up_Param_EmpID_Recursive','P') IS NOT NULL

    DROP PROCEDURE [dbo].[usp_Employee_Select_Up_Param_EmpID_Recursive];

    GO

    CREATE PROCEDURE [dbo].[usp_Employee_Select_Up_Param_EmpID_Recursive]

    @selectEmpID INT, -- required

    @maxSupLevel INT = 10 -- NULL = no limit

    AS

    DECLARE @workTbl TABLE

    (

    [EmployeeID] INT NOT NULL

    , [EmployeeName] VARCHAR(50) NOT NULL

    , [ManagerID] INT NULL

    , [SupLevel] INT NOT NULL

    );

    WITH [workTbl]

    AS

    (

    SELECT

    [e1].[EmployeeID]

    , [e1].[EmployeeName]

    , [e1].[ManagerID]

    , 0 AS [SupLevel]

    FROM [dbo].[Employee] AS [e1]

    WHERE

    [e1].[EmployeeID] = @selectEmpID

    UNION ALL

    SELECT

    [e2].[EmployeeID]

    , [e2].[EmployeeName]

    , [e2].[ManagerID]

    , [w].[SupLevel] - 1 AS [SupLevel]

    FROM [workTbl] AS [w]

    JOIN [dbo].[Employee] AS [e2] ON

    [w].[ManagerID] = [e2].[EmployeeID]

    )

    INSERT @workTbl

    SELECT

    [EmployeeID]

    , [EmployeeName]

    , [ManagerID]

    , [SupLevel]

    FROM [workTbl]

    WHERE [SupLevel] >= -@maxSupLevel

    OPTION (MAXDOP 32);

    DECLARE @minFoundSupLevel INT;

    SELECT @minFoundSupLevel = MIN([SupLevel]) FROM @workTbl;

    SELECT

    [EmployeeID]

    , [EmployeeName]

    , [ManagerID]

    , [SupLevel] - @minFoundSupLevel AS [SupLevel]

    FROM @workTbl

    ORDER BY

    [SupLevel]

    , [ManagerID]

    , [EmployeeID];

    GO

    IF OBJECT_ID(N'dbo.usp_Employee_Select_Down_Param_EmpID','P') IS NOT NULL

    DROP PROCEDURE [dbo].[usp_Employee_Select_Down_Param_EmpID];

    GO

    CREATE PROCEDURE

    [usp_Employee_Select_Down_Param_EmpID]

    @selectEmpID int = NULL, -- NULL=all

    @maxSupLevel int=10 -- NULL=no limit

    AS

    SET @selectEmpID = IsNull(@selectEmpID, 0)

    DECLARE @workTbl

    TABLE (EmployeeID int,

    EmployeeName varchar(50),

    ManagerID int, SupLevel int)

    DECLARE @prevHits int

    SET @prevHits = 0

    DECLARE @supLevel int

    SET @supLevel = 0

    --

    --Insert work record(s) into @workTbl

    --

    INSERT INTO @workTbl

    SELECT

    Employee.EmployeeID, Employee.EmployeeName,

    Employee.ManagerID, @supLevel

    FROM Employee

    WHERE

    ((@selectEmpID = 0)

    AND (IsNull(Employee.ManagerID, 0) = 0)

    )

    OR

    (Employee.EmployeeID = @selectEmpID)

    SET @prevHits = @@ROWCOUNT

    IF @prevHits > 0

    SET @supLevel = @supLevel + 1

    WHILE

    ( (@prevHits > 0)

    AND

    ((@maxSupLevel is NULL)

    OR

    (@supLevel <= @maxSupLevel))

    )

    BEGIN

    INSERT INTO @workTbl

    SELECT

    Employee.EmployeeID,

    Employee.EmployeeName,

    Employee.ManagerID,

    @supLevel

    FROM Employee

    WHERE

    NOT EXISTS

    (SELECT EmployeeID

    FROM @workTbl

    WHERE EmployeeID

    = Employee.EmployeeID)

    AND EXISTS

    (SELECT EmployeeID

    FROM @workTbl

    WHERE EmployeeID = Employee.ManagerID)

    SET @prevHits = @@ROWCOUNT

    SET @supLevel = @supLevel + 1

    END

    SELECT

    EmployeeID, EmployeeName, ManagerID, SupLevel

    FROM @worktbl

    ORDER BY SupLevel, ManagerID, EmployeeID;

    GO

    IF OBJECT_ID(N'dbo.usp_Employee_Select_Up_Param_EmpID','P') IS NOT NULL

    DROP PROCEDURE [dbo].[usp_Employee_Select_Up_Param_EmpID];

    GO

    CREATE PROCEDURE

    [usp_Employee_Select_Up_Param_EmpID]

    @selectEmpID int, -- required

    @maxSupLevel int = 10 --null=no limit

    AS

    SET @selectEmpID = IsNull(@selectEmpID, 0)

    --

    -- Declare variables

    --

    DECLARE @workTbl

    TABLE (ID int IDENTITY(1,1),

    EmployeeID int, EmployeeName varchar(50),

    ManagerID int, SupLevel int)

    DECLARE @prevHits int

    SET @prevHits = 0

    DECLARE @nextEmployeeID int

    SET @nextEmployeeID = 0

    DECLARE @supLevel int

    SET @supLevel = 0

    DECLARE @maxFoundSupLevel int

    SET @maxFoundSupLevel = 0

    --

    -- Insert the starting record into @workTbl

    --

    INSERT INTO @workTbl

    SELECT

    Employee.EmployeeID, Employee.EmployeeName,

    Employee.ManagerID, @supLevel

    FROM Employee

    WHERE

    IsNull(Employee.EmployeeID, 0) =

    @selectEmpID

    SET @prevHits = @@ROWCOUNT

    IF @prevHits > 0

    BEGIN

    SET @nextEmployeeID = 999999

    -- any number gt zero

    SET @supLevel = @supLevel + 1

    END

    ELSE

    SET @nextEmployeeID = NULL

    --

    -- Insert a subsequent records into @workTbl

    -- The Supervisor ID of the most recently

    -- inserted record in the work table

    -- is the Employee ID of the next

    -- addition to the work table.

    --

    WHILE

    ((@nextEmployeeID > 0)

    AND

    ((@maxSupLevel is NULL)

    OR (@supLevel <= @maxSupLevel))

    )

    BEGIN

    -- Set the @nextEmployeeID to the ManagerID of

    -- the most recent record in the table

    SET @nextEmployeeID =

    (SELECT

    ManagerID

    FROM @workTbl

    WHERE ID =

    (SELECT MAX(ID) FROM @workTbl)

    )

    INSERT INTO @workTbl

    SELECT

    Employee.EmployeeID,

    Employee.EmployeeName,

    Employee.ManagerID,

    @supLevel

    FROM Employee

    WHERE Employee.EmployeeID = @nextEmployeeID

    SET @prevHits = @@ROWCOUNT

    SET @supLevel = @supLevel + 1

    END

    -- Find the max SupLevel so numbering can go

    -- from low to high

    SET @maxFoundSupLevel

    = (SELECT MAX(SupLevel) FROM @workTbl)

    --

    -- Return the contents of the work table

    -- as the result set.

    --

    SELECT

    EmployeeID, EmployeeName, ManagerID,

    @maxFoundSupLevel - SupLevel AS SupLevel

    FROM @workTbl

    ORDER BY SupLevel, ManagerID, EmployeeID;

    GO

    --**************************************************************

    --**************************************************************

    --*** ***

    --*** STEP 3 - Run the tests. ***

    --*** ***

    --**************************************************************

    --**************************************************************

    SET NOCOUNT ON;

    DECLARE @before DATETIME;

    DECLARE @minLevel INT;

    DECLARE @maxLevel INT;

    DECLARE @currentLevel INT;

    DECLARE @rows INT;

    DECLARE @results TABLE

    (

    [TestID] INT IDENTITY(1,1) NOT NULL PRIMARY KEY

    , [Direction] VARCHAR(10) NOT NULL

    , [Method] VARCHAR(10) NOT NULL

    , [Level] INT NOT NULL

    , [Rows] INT NOT NULL

    , [DurationMS] INT NOT NULL

    );

    IF OBJECT_ID(N'tempdb..#dummy') IS NOT NULL

    DROP TABLE #dummy;

    CREATE TABLE #dummy

    (

    [EmployeeID] INT NOT NULL

    , [EmployeeName] VARCHAR(36) NOT NULL

    , [ManagerID] INT NULL

    , [SupLevel] INT NOT NULL

    );

    SET @minLevel = 3;

    SET @maxLevel = 9;

    SET @currentLevel = @minLevel;

    WHILE @currentLevel <= @maxLevel

    BEGIN

    RAISERROR(N'Testing down recursive at level %d...', 0, 0, @currentLevel) WITH NOWAIT;

    TRUNCATE TABLE #dummy;

    SET @before = GETDATE();

    INSERT #dummy

    EXEC [dbo].[usp_Employee_Select_Down_Param_EmpID_Recursive] @selectEmpID = 1, @maxSupLevel = @currentLevel;

    SET @rows = @@ROWCOUNT;

    INSERT @results ([Direction], [Method], [Level], [Rows], [DurationMS])

    VALUES ('Down', 'Recursive', @currentLevel, @rows, DATEDIFF(ms, @before, GETDATE()));

    SET @currentLevel = @currentLevel + 1;

    END

    SET @minLevel = 3;

    -- this takes a long time to run if you go past 6

    SET @maxLevel = 6;

    SET @currentLevel = @minLevel;

    WHILE @currentLevel <= @maxLevel

    BEGIN

    RAISERROR(N'Testing down iterative at level %d...', 0, 0, @currentLevel) WITH NOWAIT;

    TRUNCATE TABLE #dummy;

    SET @before = GETDATE();

    INSERT #dummy

    EXEC [dbo].[usp_Employee_Select_Down_Param_EmpID] @selectEmpID = 1, @maxSupLevel = @currentLevel;

    SET @rows = @@ROWCOUNT;

    INSERT @results ([Direction], [Method], [Level], [Rows], [DurationMS])

    VALUES ('Down', 'Iterative', @currentLevel, @rows, DATEDIFF(ms, @before, GETDATE()));

    SET @currentLevel = @currentLevel + 1;

    END

    SET @minLevel = 10;

    SET @maxLevel = 15;

    SET @currentLevel = @minLevel;

    WHILE @currentLevel <= @maxLevel

    BEGIN

    RAISERROR(N'Testing up recursive at level %d...', 0, 0, @currentLevel) WITH NOWAIT;

    TRUNCATE TABLE #dummy;

    SET @before = GETDATE();

    INSERT #dummy

    EXEC [dbo].[usp_Employee_Select_Up_Param_EmpID_Recursive] @selectEmpID = 1000000, @maxSupLevel = @currentLevel;

    SET @rows = @@ROWCOUNT;

    INSERT @results ([Direction], [Method], [Level], [Rows], [DurationMS])

    VALUES ('Up', 'Recursive', @currentLevel, @rows, DATEDIFF(ms, @before, GETDATE()));

    SET @currentLevel = @currentLevel + 1;

    END

    SET @minLevel = 10;

    SET @maxLevel = 15;

    SET @currentLevel = @minLevel;

    WHILE @currentLevel <= @maxLevel

    BEGIN

    RAISERROR(N'Testing up iterative at level %d...', 0, 0, @currentLevel) WITH NOWAIT;

    TRUNCATE TABLE #dummy;

    SET @before = GETDATE();

    INSERT #dummy

    EXEC [dbo].[usp_Employee_Select_Up_Param_EmpID] @selectEmpID = 1000000, @maxSupLevel = @currentLevel;

    SET @rows = @@ROWCOUNT;

    INSERT @results ([Direction], [Method], [Level], [Rows], [DurationMS])

    VALUES ('Up', 'Iterative', @currentLevel, @rows, DATEDIFF(ms, @before, GETDATE()));

    SET @currentLevel = @currentLevel + 1;

    END

    SELECT

    [TestID]

    , [Direction]

    , [Method]

    , [Level]

    , [Rows]

    , [DurationMS]

    , CAST([Rows] AS FLOAT) / CASE WHEN [DurationMS] = 0 THEN 1 ELSE [DurationMS] END AS [RowsPerMS]

    FROM @results

    ORDER BY

    [RowsPerMS] DESC;

    IF OBJECT_ID(N'tempdb..#dummy') IS NOT NULL

    DROP TABLE #dummy;

  • So, we have two great questions that have been implied on this thread.

    The first question is, "Will rCTE code be faster than the code in the article?" If we run the great test harness that Kevin provided, you will see that the answer is "Yes, and by a whole lot as the levels get deeper."

    Wait... don't get to comfortable, yet! 🙂 The next question is, "Is an rCTE better than procedural code?" The answer is, as always, "It Depends". I modified Kevin's good test harness to only use the "down" rCTE he provided and some "down" While Loop code that I'll provide in a minute. The reason that I only included "downline" code was just to make the output a little more obvious. Here's the output from the run where the recursive CTE went first...

    Testing down recursive at level 1...

    Testing down recursive at level 2...

    Testing down recursive at level 3...

    Testing down recursive at level 4...

    Testing down recursive at level 5...

    Testing down recursive at level 6...

    Testing down recursive at level 7...

    Testing down recursive at level 8...

    Testing down recursive at level 9...

    Testing down recursive at level 10...

    Testing down loop at level 1...

    Testing down loop at level 2...

    Testing down loop at level 3...

    Testing down loop at level 4...

    Testing down loop at level 5...

    Testing down loop at level 6...

    Testing down loop at level 7...

    Testing down loop at level 8...

    Testing down loop at level 9...

    Testing down loop at level 10...

    TestID Direction Method Level Rows DurationMS RowsPerMS

    ----------- ---------- -------------------- ----------- ----------- ----------- ----------------------

    17 Down While Loop 7 51562 4716 10.9334181509754

    13 Down While Loop 3 788 73 10.7945205479452

    16 Down While Loop 6 23563 2303 10.2314372557534

    14 Down While Loop 4 3005 300 10.0166666666667

    15 Down While Loop 5 9204 943 9.76033934252386

    20 Down While Loop 10 260633 31506 8.27248778010538

    18 Down While Loop 8 98840 12136 8.14436387607119

    19 Down While Loop 9 168687 21780 7.74504132231405

    6 Down Recursive 6 23563 3150 7.48031746031746

    3 Down Recursive 3 788 113 6.97345132743363

    2 Down Recursive 2 155 23 6.73913043478261

    5 Down Recursive 5 9204 1366 6.73792093704246

    4 Down Recursive 4 3005 460 6.53260869565217

    8 Down Recursive 8 98840 15620 6.32778489116517

    9 Down Recursive 9 168687 30503 5.53017735960397

    10 Down Recursive 10 260633 48820 5.3386521917247

    7 Down Recursive 7 51562 10616 4.85700828937453

    12 Down While Loop 2 155 36 4.30555555555556

    1 Down Recursive 1 18 6 3

    11 Down While Loop 1 18 40 0.45

    ... and here are the results when I let the While Loop go first. The results are very similar in both cases. With only minor exceptions, the While Loop wins on all but the tiniest of hierarchies.

    Testing down loop at level 1...

    Testing down loop at level 2...

    Testing down loop at level 3...

    Testing down loop at level 4...

    Testing down loop at level 5...

    Testing down loop at level 6...

    Testing down loop at level 7...

    Testing down loop at level 8...

    Testing down loop at level 9...

    Testing down loop at level 10...

    Testing down recursive at level 1...

    Testing down recursive at level 2...

    Testing down recursive at level 3...

    Testing down recursive at level 4...

    Testing down recursive at level 5...

    Testing down recursive at level 6...

    Testing down recursive at level 7...

    Testing down recursive at level 8...

    Testing down recursive at level 9...

    Testing down recursive at level 10...

    TestID Direction Method Level Rows DurationMS RowsPerMS

    ----------- ---------- -------------------- ----------- ----------- ----------- ----------------------

    3 Down While Loop 3 788 73 10.7945205479452

    7 Down While Loop 7 51562 4793 10.7577717504694

    6 Down While Loop 6 23563 2216 10.6331227436823

    5 Down While Loop 5 9204 956 9.62761506276151

    4 Down While Loop 4 3005 313 9.60063897763578

    9 Down While Loop 9 168687 19346 8.71947689444846

    8 Down While Loop 8 98840 12123 8.15309741813082

    10 Down While Loop 10 260633 34770 7.49591601955709

    15 Down Recursive 5 9204 1266 7.27014218009479

    13 Down Recursive 3 788 110 7.16363636363636

    14 Down Recursive 4 3005 423 7.10401891252955

    16 Down Recursive 6 23563 3420 6.88976608187135

    17 Down Recursive 7 51562 7890 6.53510773130545

    18 Down Recursive 8 98840 15806 6.2533215234721

    12 Down Recursive 2 155 26 5.96153846153846

    20 Down Recursive 10 260633 49420 5.2738365034399

    19 Down Recursive 9 168687 32133 5.24964989263374

    2 Down While Loop 2 155 43 3.6046511627907

    11 Down Recursive 1 18 6 3

    1 Down While Loop 1 18 26 0.692307692307692

    And [font="Arial Black"]THAT's[/font] why I get so ticked off when people say something like "A recursive CTE is faster" without any proof. It's just hearsay based on someone's idea of what a loop should actually do. Stop spreading myths without proof. The real answer for any given problem is and shall always be "It Depends" until someone can prove otherwise with more than a small amount of rows.

    The really neat thing about the While Loop method is that it works in ALL versions of T-SQL from 6.5 on (I don't know about earlier versions).

    Here's the abbreviated code that I worked with. Again, thanks to Kevin for providing the test harness.

    --**************************************************************

    --**************************************************************

    --*** ***

    --*** STEP 1 - Create the Employee table and fill it. ***

    --*** ***

    --**************************************************************

    --**************************************************************

    RAISERROR(N'Creating the Employee table and populating it...', 0, 0) WITH NOWAIT;

    DECLARE @pRowsToBuild INT

    select @pRowsToBuild = 1000000;

    IF OBJECT_ID(N'dbo.Employee','U') IS NOT NULL

    DROP TABLE [dbo].[Employee];

    SELECT TOP (@pRowsToBuild)

    ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) AS [EmployeeID],

    CAST(0 AS INT) AS [ManagerID],

    CAST(NEWID() AS VARCHAR(36)) AS [EmployeeName]

    INTO [dbo].[Employee]

    FROM master.sys.[All_Columns] AS [ac1]

    CROSS JOIN master.sys.[All_Columns] AS [ac2];

    UPDATE [dbo].[Employee] SET [ManagerID] = CASE WHEN [EmployeeID] > 1 THEN ABS(CHECKSUM(NEWID())) % ([EmployeeID] - 1) + 1 ELSE NULL END;

    ALTER TABLE [dbo].[Employee] ADD CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED ([EmployeeID]);

    CREATE INDEX [IX_Employee_ManagerID] ON [dbo].[Employee] ([ManagerID]);

    GO

    --**************************************************************

    --**************************************************************

    --*** ***

    --*** STEP 2 - Create the stored procedures for traversal. ***

    --*** ***

    --**************************************************************

    --**************************************************************

    RAISERROR(N'Creating the stored procedures for traversal...', 0, 0) WITH NOWAIT;

    IF OBJECT_ID(N'dbo.usp_Employee_Select_Down_Param_EmpID_Recursive','P') IS NOT NULL

    DROP PROCEDURE [dbo].[usp_Employee_Select_Down_Param_EmpID_Recursive];

    GO

    CREATE PROCEDURE [dbo].[usp_Employee_Select_Down_Param_EmpID_Recursive]

    @selectEmpID INT = NULL, -- NULL = all

    @maxSupLevel INT = 10 -- NULL = no limit

    AS

    SET @selectEmpID = ISNULL(@selectEmpID, 0);

    WITH [workTbl]

    AS

    (

    SELECT

    [e1].[EmployeeID]

    , [e1].[EmployeeName]

    , [e1].[ManagerID]

    , 0 AS [SupLevel]

    FROM [dbo].[Employee] AS [e1]

    WHERE

    [e1].[EmployeeID] = @selectEmpID

    UNION ALL

    SELECT

    [e2].[EmployeeID]

    , [e2].[EmployeeName]

    , [e2].[ManagerID]

    , [w].[SupLevel] + 1 AS [SupLevel]

    FROM [workTbl] AS [w]

    JOIN [dbo].[Employee] AS [e2] ON

    [e2].[ManagerID] = [w].[EmployeeID]

    AND [w].[SupLevel] < @maxSupLevel

    )

    SELECT

    [EmployeeID]

    , [EmployeeName]

    , [ManagerID]

    , [SupLevel]

    FROM [workTbl]

    ORDER BY

    [SupLevel]

    , [ManagerID]

    , [EmployeeID]

    OPTION (MAXDOP 32);

    GO

    --===== Jeff While Loop ============================================================

    IF OBJECT_ID('tempdb.dbo.JeffLoop','P') IS NOT NULL

    DROP PROCEDURE dbo.JeffLoop;

    GO

    CREATE PROCEDURE dbo.JeffLoop

    (

    @selectEmpID INT,

    @maxSupLevel INT

    )

    AS

    --===== Conditionally drop temp table to make reruns easier

    IF OBJECT_ID('TempDB..#TargetTableLoop') IS NOT NULL

    DROP TABLE #TargetTableLoop

    ;

    --===== Create the Hierarchy table

    CREATE TABLE #TargetTableLoop

    (

    EmployeeID INT,

    EmployeeName VARCHAR(40),

    ManagerID INT,

    SupLevel INT

    )

    ;

    --===== Create and preset a Hierarchy Level counter.

    DECLARE @CurrentHierarchyLevel INT

    SET @CurrentHierarchyLevel = 0

    ;

    BEGIN TRANSACTION

    --===== Seed the Hierarchy table with the top of the subtree

    -- (Same as "anchor" portion of the rCTE).

    INSERT INTO #TargetTableLoop

    (EmployeeID, EmployeeName, ManagerID, SupLevel)

    SELECT EmployeeID,

    EmployeeName,

    ManagerID,

    0 AS SupLevel

    FROM dbo.Employee

    WHERE EmployeeID = @selectEmpID

    ;

    --===== Determine the rest of the hierarchy

    -- The loop processes sets of rows based on hierarchy level.

    -- (Same as "recursive" portion of the rCTE).

    WHILE @@ROWCOUNT > 0 AND @CurrentHierarchyLevel < @maxSupLevel

    BEGIN

    SET @CurrentHierarchyLevel = @CurrentHierarchyLevel + 1

    INSERT INTO #TargetTableLoop

    (EmployeeID, EmployeeName, ManagerID, SupLevel)

    SELECT emp.EmployeeID,

    emp.EmployeeName,

    emp.ManagerID,

    @CurrentHierarchyLevel AS SupLevel

    FROM dbo.Employee AS emp

    INNER JOIN #TargetTableLoop AS tgt

    ON emp.ManagerID = tgt.EmployeeID

    AND tgt.SupLevel = @CurrentHierarchyLevel - 1

    END

    ;

    COMMIT

    --===== Return the final results

    SELECT EmployeeID, EmployeeName, ManagerID, SupLevel

    FROM #TargetTableLoop

    ORDER BY SupLevel,ManagerID,EmployeeID

    GO

    --**************************************************************

    --**************************************************************

    --*** ***

    --*** STEP 3 - Run the tests. ***

    --*** ***

    --**************************************************************

    --**************************************************************

    SET NOCOUNT ON;

    DECLARE @before DATETIME;

    DECLARE @minLevel INT;

    DECLARE @maxLevel INT;

    DECLARE @currentLevel INT;

    DECLARE @rows INT;

    DECLARE @results TABLE

    (

    [TestID] INT IDENTITY(1,1) NOT NULL PRIMARY KEY

    , [Direction] VARCHAR(10) NOT NULL

    , [Method] VARCHAR(20) NOT NULL

    , [Level] INT NOT NULL

    , [Rows] INT NOT NULL

    , [DurationMS] INT NOT NULL

    );

    IF OBJECT_ID(N'tempdb..#dummy') IS NOT NULL

    DROP TABLE #dummy;

    CREATE TABLE #dummy

    (

    [EmployeeID] INT NOT NULL

    , [EmployeeName] VARCHAR(36) NOT NULL

    , [ManagerID] INT NULL

    , [SupLevel] INT NOT NULL

    );

    SELECT @minLevel = 1,

    @maxLevel = 10,

    @currentLevel = @minLevel;

    WHILE @currentLevel <= @maxLevel

    BEGIN

    RAISERROR(N'Testing down loop at level %d...', 0, 0, @currentLevel) WITH NOWAIT;

    TRUNCATE TABLE #dummy;

    SET @before = GETDATE();

    INSERT #dummy

    EXEC [dbo].[JeffLoop] @selectEmpID = 1, @maxSupLevel = @currentLevel;

    SET @rows = @@ROWCOUNT;

    INSERT @results ([Direction], [Method], [Level], [Rows], [DurationMS])

    VALUES ('Down', 'While Loop', @currentLevel, @rows, DATEDIFF(ms, @before, GETDATE()));

    SET @currentLevel = @currentLevel + 1;

    END

    GO

    SET @minLevel = 1;

    SET @maxLevel = 10;

    SET @currentLevel = @minLevel;

    WHILE @currentLevel <= @maxLevel

    BEGIN

    RAISERROR(N'Testing down recursive at level %d...', 0, 0, @currentLevel) WITH NOWAIT;

    TRUNCATE TABLE #dummy;

    SET @before = GETDATE();

    INSERT #dummy

    EXEC [dbo].[usp_Employee_Select_Down_Param_EmpID_Recursive] @selectEmpID = 1, @maxSupLevel = @currentLevel;

    SET @rows = @@ROWCOUNT;

    INSERT @results ([Direction], [Method], [Level], [Rows], [DurationMS])

    VALUES ('Down', 'Recursive', @currentLevel, @rows, DATEDIFF(ms, @before, GETDATE()));

    SET @currentLevel = @currentLevel + 1;

    END

    GO

    SELECT

    [TestID]

    , [Direction]

    , [Method]

    , [Level]

    , [Rows]

    , [DurationMS]

    , CAST([Rows] AS FLOAT) / CASE WHEN [DurationMS] = 0 THEN 1 ELSE [DurationMS] END AS [RowsPerMS]

    FROM @results

    ORDER BY

    [RowsPerMS] DESC;

    IF OBJECT_ID(N'tempdb..#dummy') IS NOT NULL

    DROP TABLE #dummy;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • W. Kevin Hazzard (5/6/2012)


    Agreed, Jeff, about the mindset. Simpler syntax does not equal better performance. But in this case, I believe that I'm on the right track. Look at the test results I posted on this thread. It's clear that the rCTE is providing better performance as you throw more data at it while the iterative approach suffers. That's a pattern that I've come to appreciate in my 25 years in this business. Smells like imperative versus functional. Or Row-By-Agonizing-Row versus set-based, if you will. 🙂

    BWAAA-HAAA!!! Then again, there are some things that appear to be functional but are more of just a rote imperative than you think. 😉

    Thanks for rising to the occasion with your test harness, Kevin (seriously!). If you look at the revision dates that I forgot to remove from the hierarchy generator I provided, you'll see that I've been down a similar path before. Well done, Kevin.

    Now watch someone come up with an index that will smoke both of us. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 31 through 45 (of 49 total)

You must be logged in to reply to this topic. Login to reply