rCTE vs LIKE for Hierarchy

  • 1146 rows, which should not be a problem, but some procedures take up to 4 minutes to run with the like code, and that's pretty bad, my concern is that the rCTE code generates something like 100 table scans with 10k+ logical reads, and I'm a begginer to this but I don't think this much IO is ok, the LIKE code posted does something like 1 scan 68 logical reads, yet performs timewise much much worse and that's where i get lost

  • A thousand-row adjacency hierarchy should be pretty fast to resolve using an rCTE. You'll get a lot of I/O (which you've already seen), but performance should be pretty fast. It'll depend on the average depth and width of the branches, of course.

    The Like method, done correctly, can use indexes, so the number of scans/reads will be low, but the overall performance would normally be pretty poor. That's what you're seeing, if I understand your posts correctly, so that matches theory and practice appropriately. Like operations with no variable on the leading edge, can use indexes and can actually be SARGable, so they can be quite efficient when done well. However, that will only allow a limited number of hierarchy operations, like finding all ancestors or all descendents, without being efficient at things like finding the immediate ancestor or the fifth descendent.

    With that Hierarchy column, you're already about 90% of the way to a HierarchyID solution. On such a small table, one of those will be amazingly fast. You posted in the SQL 2008 forum, does that mean you're using SQL 2008? If so, you should test converting to HierarchyID on this.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the replys, I've always linked IO reads to performance, so i was very confused when i saw a solution generating higher IO and still performing way better, yes I'm using a SQL Server 2008 and I would love to get my hands into changing that field to a HierarchyID but i'll have to study impacts that could cause on existing code so for now i'll ride with a rCTE, I'm aware this is already a bit off the topic, but anyone could point some read (book,blog,alikes) so I could take better grasp with IO / Performance relationship so I won't get frustated again with IO? Thanks Guys !

  • ariel_mlk (9/11/2012)


    1146 rows, which should not be a problem, but some procedures take up to 4 minutes to run with the like code, and that's pretty bad, my concern is that the rCTE code generates something like 100 table scans with 10k+ logical reads, and I'm a begginer to this but I don't think this much IO is ok, the LIKE code posted does something like 1 scan 68 logical reads, yet performs timewise much much worse and that's where i get lost

    I agree. Believe it or not, a While loop that does the same thing as the rCTE will work at least as fast and sometimes faster and with a heck of a lot less reads.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree. Believe it or not, a While loop that does the same thing as the rCTE will work at least as fast and sometimes faster and with a heck of a lot less reads.

    Alright let me get on my feet again !

    I'll write that down on the "obsure rituals" section of my notes, I'll test on that, I'm surprised rCTE are so popular for this kind of job if while loops can save you some big time IO, you have come by this by testing or is there any logic to this ?

    Thanks for the attention

  • ariel_mlk (9/12/2012)


    I agree. Believe it or not, a While loop that does the same thing as the rCTE will work at least as fast and sometimes faster and with a heck of a lot less reads.

    Alright let me get on my feet again !

    I'll write that down on the "obsure rituals" section of my notes, I'll test on that, I'm surprised rCTE are so popular for this kind of job if while loops can save you some big time IO, you have come by this by testing or is there any logic to this ?

    Thanks for the attention

    For Jeff, trust me, it comes from extensive testing.

    And even then, test, test, and test again. What works well in some situations may not in others.

  • ariel_mlk (9/12/2012)


    I agree. Believe it or not, a While loop that does the same thing as the rCTE will work at least as fast and sometimes faster and with a heck of a lot less reads.

    Alright let me get on my feet again !

    I'll write that down on the "obsure rituals" section of my notes, I'll test on that, I'm surprised rCTE are so popular for this kind of job if while loops can save you some big time IO, you have come by this by testing or is there any logic to this ?

    Thanks for the attention

    You're kind of new here so you don't know. I don't say these types of things unless I can back them up. Normally, I'll include the proof code with whatever post I make such a statement in but was in a hurry last night and didn't.

    While the following code is not the end-all to be-all on this subject, it does show what I'm talking about. It's a simple "Count from 1 to a million and store it in a table" bit of code. Of course, both the WHILE loop and the rCTE make sucking sounds when you compare them to other methods (a couple of which are also included in the test code below). If you don't think duration is a good enough indication of such things, then you can certainly run the code with SQL Profiler turned on.

    RAISERROR('

    *******************************************************************************

    Create a table and then insert a million sequential numbers.

    *******************************************************************************

    ',0,1) WITH NOWAIT;

    RAISERROR('

    ===============================================================================

    WHILE Loop Insert in a single explicit transaction.

    ===============================================================================

    ',0,1) WITH NOWAIT;

    --===== Environmenntal an other presets

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME,

    @DurationMS INT;

    --===== Start the timer

    SELECT @StartTime = GETDATE();

    --===== Create a place to store the results

    IF OBJECT_ID('dbo.TestNumbers','U') IS NOT NULL

    DROP TABLE dbo.TestNumbers;

    CREATE TABLE dbo.TestNumbers (N INT); --New code added

    --===== Count from 1 to 1000000 and display the count.

    DECLARE @Counter INT; --Declare a counter

    SET @Counter = 1; --Preset the counter to 1

    --===== Insert the million rows in a single transaction

    -- just like an rCTE/INSERT would.

    BEGIN TRANSACTION;

    WHILE @Counter <= 1000000

    BEGIN

    INSERT INTO dbo.TestNumbers (N) --New code added

    SELECT @Counter; --Display the count (same as before)

    SET @Counter = @Counter + 1; --Add 1 to counter

    END; --Is the counter <= 100?

    --If Yes, branch back to display the count.

    --If no, continue.

    COMMIT

    ;

    --===== Display the duration

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE());

    RAISERROR('Duration MS: %u',0,1,@DurationMS) WITH NOWAIT;

    GO

    RAISERROR('

    ===============================================================================

    rCTE Insert in a single implicit transaction.

    ===============================================================================

    ',0,1) WITH NOWAIT;

    --===== Environmenntal an other presets

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME,

    @DurationMS INT;

    --===== Start the timer

    SELECT @StartTime = GETDATE();

    --===== Create a place to store the results

    IF OBJECT_ID('dbo.TestNumbers','U') IS NOT NULL

    DROP TABLE dbo.TestNumbers;

    CREATE TABLE dbo.TestNumbers (N INT); --New code added

    --===== Insert the million rows in a single transaction

    WITH

    cteCounter AS

    (--==== Counter rCTE counts from 0 to 11

    SELECT 0 AS N --This provides the starting point (anchor) of zero

    UNION ALL

    SELECT N + 1 --This is the recursive part

    FROM cteCounter

    WHERE N < 1000000

    )--==== Add the counter value to a start date and you get multiple dates

    INSERT INTO dbo.TestNumbers

    (N)

    SELECT N

    FROM cteCounter

    OPTION (MAXRECURSION 0)

    ;

    --===== Display the duration

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE());

    RAISERROR('Duration MS: %u',0,1,@DurationMS) WITH NOWAIT;

    GO

    RAISERROR('

    ===============================================================================

    CROSS JOIN as a row-source Insert in a single implicit transaction.

    ===============================================================================

    ',0,1) WITH NOWAIT;

    --===== Environmenntal an other presets

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME,

    @DurationMS INT;

    --===== Start the timer

    SELECT @StartTime = GETDATE();

    --===== Create a place to store the results

    IF OBJECT_ID('dbo.TestNumbers','U') IS NOT NULL

    DROP TABLE dbo.TestNumbers;

    CREATE TABLE dbo.TestNumbers (N INT); --New code added

    --===== Insert the million rows in a single transaction

    INSERT INTO dbo.TestNumbers

    (N)

    SELECT TOP 1000000

    N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    --===== Display the duration

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE());

    RAISERROR('Duration MS: %u',0,1,@DurationMS) WITH NOWAIT;

    GO

    RAISERROR('

    ===============================================================================

    Cascading CTE (cCTE) as a row-source Insert in a single implicit transaction.

    ===============================================================================

    ',0,1) WITH NOWAIT;

    --===== Environmenntal an other presets

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME,

    @DurationMS INT;

    --===== Start the timer

    SELECT @StartTime = GETDATE();

    --===== Create a place to store the results

    IF OBJECT_ID('dbo.TestNumbers','U') IS NOT NULL

    DROP TABLE dbo.TestNumbers;

    CREATE TABLE dbo.TestNumbers (N INT);

    --===== Insert the million rows in a single transaction

    WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000

    E16(N) AS (SELECT 1 FROM E8 a, E8 b), --10,000,000,000,000,000

    cteTally(N) AS (SELECT TOP 1000000

    ROW_NUMBER() OVER (ORDER BY (SELECT N))

    FROM E16)

    INSERT INTO dbo.TestNumbers

    (N)

    SELECT t.N --Some query that uses the sequential numbering

    FROM cteTally t

    ;

    --===== Display the duration

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE());

    RAISERROR('Duration MS: %u',0,1,@DurationMS) WITH NOWAIT;

    GO

    RAISERROR('

    *******************************************************************************

    Create the table on-the-fly using minimal logginng whenever possible.

    *******************************************************************************

    ',0,1) WITH NOWAIT;

    RAISERROR('

    ===============================================================================

    WHILE Loop Insert in a single explicit transaction.

    (Same as before. Cannot use SELECT INTO to create the table)

    ===============================================================================

    ',0,1) WITH NOWAIT;

    --===== Environmenntal an other presets

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME,

    @DurationMS INT;

    --===== Start the timer

    SELECT @StartTime = GETDATE();

    --===== Create a place to store the results

    IF OBJECT_ID('dbo.TestNumbers','U') IS NOT NULL

    DROP TABLE dbo.TestNumbers;

    CREATE TABLE dbo.TestNumbers (N INT); --New code added

    --===== Count from 1 to 1000000 and display the count.

    DECLARE @Counter INT; --Declare a counter

    SET @Counter = 1; --Preset the counter to 1

    --===== Insert the million rows in a single transaction

    -- just like an rCTE/INSERT would.

    BEGIN TRANSACTION;

    WHILE @Counter <= 1000000

    BEGIN

    INSERT INTO dbo.TestNumbers (N) --New code added

    SELECT @Counter; --Display the count (same as before)

    SET @Counter = @Counter + 1; --Add 1 to counter

    END; --Is the counter <= 100?

    --If Yes, branch back to display the count.

    --If no, continue.

    COMMIT

    ;

    --===== Display the duration

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE());

    RAISERROR('Duration MS: %u',0,1,@DurationMS) WITH NOWAIT;

    GO

    RAISERROR('

    ===============================================================================

    rCTE Insert in a single implicit transaction.

    ===============================================================================

    ',0,1) WITH NOWAIT;

    --===== Environmenntal an other presets

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME,

    @DurationMS INT;

    --===== Start the timer

    SELECT @StartTime = GETDATE();

    --===== Create a place to store the results

    IF OBJECT_ID('dbo.TestNumbers','U') IS NOT NULL

    DROP TABLE dbo.TestNumbers;

    --===== Insert the million rows in a single transaction

    WITH

    cteCounter AS

    (--==== Counter rCTE counts from 0 to 11

    SELECT 0 AS N --This provides the starting point (anchor) of zero

    UNION ALL

    SELECT N + 1 --This is the recursive part

    FROM cteCounter

    WHERE N < 1000000

    )--==== Add the counter value to a start date and you get multiple dates

    SELECT N

    INTO dbo.TestNumbers

    FROM cteCounter

    OPTION (MAXRECURSION 0)

    ;

    --===== Display the duration

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE());

    RAISERROR('Duration MS: %u',0,1,@DurationMS) WITH NOWAIT;

    GO

    RAISERROR('

    ===============================================================================

    CROSS JOIN as a row-source Insert in a single implicit transaction.

    ===============================================================================

    ',0,1) WITH NOWAIT;

    --===== Environmenntal an other presets

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME,

    @DurationMS INT;

    --===== Start the timer

    SELECT @StartTime = GETDATE();

    --===== Create a place to store the results

    IF OBJECT_ID('dbo.TestNumbers','U') IS NOT NULL

    DROP TABLE dbo.TestNumbers;

    --===== Insert the million rows in a single transaction

    SELECT TOP 1000000

    N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    INTO dbo.TestNumbers

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    --===== Display the duration

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE());

    RAISERROR('Duration MS: %u',0,1,@DurationMS) WITH NOWAIT;

    GO

    RAISERROR('

    ===============================================================================

    Cascading CTE (cCTE) as a row-source Insert in a single implicit transaction.

    ===============================================================================

    ',0,1) WITH NOWAIT;

    --===== Environmenntal an other presets

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    SET NOCOUNT ON;

    DECLARE @StartTime DATETIME,

    @DurationMS INT;

    --===== Start the timer

    SELECT @StartTime = GETDATE();

    --===== Create a place to store the results

    IF OBJECT_ID('dbo.TestNumbers','U') IS NOT NULL

    DROP TABLE dbo.TestNumbers;

    --===== Insert the million rows in a single transaction

    WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000

    E16(N) AS (SELECT 1 FROM E8 a, E8 b), --10,000,000,000,000,000

    cteTally(N) AS (SELECT TOP 1000000

    ROW_NUMBER() OVER (ORDER BY (SELECT N))

    FROM E16)

    SELECT t.N --Some query that uses the sequential numbering

    INTO dbo.TestNumbers

    FROM cteTally t

    ;

    --===== Display the duration

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE());

    RAISERROR('Duration MS: %u',0,1,@DurationMS) WITH NOWAIT;

    GO

    Here are the resullts from my laptop computer (I5 w/4 core and 6GB ram). Notice that even the minimally logged (SELECT INTO) version of the rCTE is still slower than the While Loop version.

    *******************************************************************************

    Create a table and then insert a million sequential numbers.

    *******************************************************************************

    ===============================================================================

    WHILE Loop Insert in a single explicit transaction.

    ===============================================================================

    Duration MS: 8180

    ===============================================================================

    rCTE Insert in a single implicit transaction.

    ===============================================================================

    Duration MS: 12666

    ===============================================================================

    CROSS JOIN as a row-source Insert in a single implicit transaction.

    ===============================================================================

    Duration MS: 2016

    ===============================================================================

    Cascading CTE (cCTE) as a row-source Insert in a single implicit transaction.

    ===============================================================================

    Duration MS: 2323

    *******************************************************************************

    Create the table on-the-fly using minimal logginng whenever possible.

    *******************************************************************************

    ===============================================================================

    WHILE Loop Insert in a single explicit transaction.

    (Same as before. Cannot use SELECT INTO to create the table)

    ===============================================================================

    Duration MS: 8226

    ===============================================================================

    rCTE Insert in a single implicit transaction.

    ===============================================================================

    Duration MS: 9410

    ===============================================================================

    CROSS JOIN as a row-source Insert in a single implicit transaction.

    ===============================================================================

    Duration MS: 370

    ===============================================================================

    Cascading CTE (cCTE) as a row-source Insert in a single implicit transaction.

    ===============================================================================

    Duration MS: 350

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Not only i'm new here as I'm new to SQL Server, but I've already read several of your posts, and others members of this community to be aware that when you state something you can back it up, I didn't mean to doubt you, and the examples are indeed a proof of what you state, I was just surprised rCTE's could perform slower than while's, I'll try to re-write that rCTE and play around with it, I didn't mean to be disrespectful in any way, thanks for you time ! 🙂

  • Ariel Nessi (9/13/2012)


    Jeff,

    Not only i'm new here as I'm new to SQL Server, but I've already read several of your posts, and others members of this community to be aware that when you state something you can back it up, I didn't mean to doubt you, and the examples are indeed a proof of what you state, I was just surprised rCTE's could perform slower than while's, I'll try to re-write that rCTE and play around with it, I didn't mean to be disrespectful in any way, thanks for you time ! 🙂

    Absolutely no offense taken and no disrespect perceived. In retrospect, I'm the one that was out of line because I made a statement about a contested performance point with no immediate proof in the form of code. Never just trust people on things like this. Not even me. Your questions were spot on and appreciated.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • While an rCTE vs a While Loop can be a poor decision, there are other times when it's not.

    Tests that don't involve any table-query, just Insert statements, or a RAM-resident Select (like the cCTE), don't test all the factors in building an adjacency hierarchy crawl in a While loop.

    Here's a test of that kind of thing:

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T;

    DECLARE @Start DATETIME = GETDATE();

    WITH rCTE

    AS (SELECT ID,

    ParentID,

    1 AS Lvl

    FROM dbo.HierarchyTest

    WHERE ID = 1

    UNION ALL

    SELECT H2.ID,

    H2.ParentID,

    rCTE.Lvl + 1

    FROM dbo.HierarchyTest AS H2

    INNER JOIN rCTE

    ON H2.ParentID = rCTE.ID)

    SELECT *

    INTO #T

    FROM rCTE

    OPTION (MAXRECURSION 0);

    SELECT DATEDIFF(millisecond, @Start, GETDATE()) AS TotalCTETime;

    IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

    DROP TABLE #T;

    SET @Start = GETDATE();

    CREATE TABLE #T2

    (ID INT,

    ParentID INT,

    Lvl INT);

    INSERT INTO #T2

    (ID,

    ParentID,

    Lvl)

    SELECT ID,

    ParentID,

    1

    FROM dbo.HierarchyTest

    WHERE ID = 1;

    WHILE @@rowcount > 0

    INSERT INTO #T2

    (ID,

    ParentID,

    Lvl)

    SELECT H2.ID,

    H2.ParentID,

    #T2.Lvl + 1

    FROM dbo.HierarchyTest AS H2

    INNER JOIN #T2

    ON H2.ParentID = #T2.ID

    WHERE H2.ID NOT IN (SELECT ID

    FROM #T2);

    SELECT DATEDIFF(millisecond, @Start, GETDATE()) AS TotalWhileTime;

    I created a test table a while back with a set of complex hierarchies in it. The one that starts with ID 1, used in this test, is 427 levels deep. So the rCTE needs the MaxRecursion option in order to crawl the whole thing. That slows it down, of course.

    In my tests (core i7 Quad, 16 Gig RAM, Windows 7 64-bit, running SQL 2008 R2 Dev Edition), the rCTE averages 6 milliseconds for the whole crawl, including the DDL for the Select Into (used to eliminate data-rendering time). The While loop averages 2296 milliseconds.

    Inserting the While loop version into a table variable, to reduce logging, doesn't materially change the performance (+/- 30 milliseconds compared to temp table). Probably what it gains in lack of logging, it loses in the Where Not In piece of the query (necessary to avoid an infinite loop).

    On smaller hierarchies, the While Loop version is usually "fast enough" that it doesn't matter. But the rCTE will usually be faster on the same datasets. Has been in ever test I've ever done, but that's just my testing, and YMMV always applies.

    As an aside, the same table has HierarchyID datatype information on the same hierarchies. The same test structure gives an average query time of 36 milliseconds when querying that way. Slower than the rCTE for such a simple query, but still far faster than the While Loop method. More complex queries will slow down the rCTE significantly, while Hierarchy will, per my tests, stay about the same. Either will usually remain much faster than a comparable While Loop. Nested Sets will be faster to query, of course. Usually sub-millisecond for most common hierarchy structures.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Alright guys,

    Jeff,

    If you don't think duration is a good enough indication of such things, then you can certainly run the code with SQL Profiler turned on.

    That's exactly my point, when I'm looking to tune things up I always looking for duration,

    but I always thought IO counts were pretty much directly associated with overall performance, and therefore, with response time. Following by your examples, the fastest one is the one that doesn't even generate IO and one can perceive an IO -> Duration relation with all the others.

    Now back to my specific case, the rCTE IS faster than the LIKE that's a fact at least for my enviroment, but it skyroofs on IO comparison ! I just would like to understand that behavior.

    GSquared, that was very informative for me, thanks for the input, I'll try that while in with my scenario, so rCTE really tend to be faster than while's for Hierarchy crawling, that's a relief, I'll just check if their IO behavior follow their duration behavior, or if I should throw away this IO/Duration relation I believe that exists

  • On hierarchies, While and rCTE usually do one index scan per level in the hierarchy. It can blow up to one scan per node in some cases. That's the main advantage of both Nested Sets and HierarchyID, in that they usually just do one index/table scan, and it's a range-scan not a full scan.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (9/13/2012)


    While an rCTE vs a While Loop can be a poor decision, there are other times when it's not.

    Tests that don't involve any table-query, just Insert statements, or a RAM-resident Select (like the cCTE), don't test all the factors in building an adjacency hierarchy crawl in a While loop.

    Here's a test of that kind of thing:

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T;

    DECLARE @Start DATETIME = GETDATE();

    WITH rCTE

    AS (SELECT ID,

    ParentID,

    1 AS Lvl

    FROM dbo.HierarchyTest

    WHERE ID = 1

    UNION ALL

    SELECT H2.ID,

    H2.ParentID,

    rCTE.Lvl + 1

    FROM dbo.HierarchyTest AS H2

    INNER JOIN rCTE

    ON H2.ParentID = rCTE.ID)

    SELECT *

    INTO #T

    FROM rCTE

    OPTION (MAXRECURSION 0);

    SELECT DATEDIFF(millisecond, @Start, GETDATE()) AS TotalCTETime;

    IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

    DROP TABLE #T;

    SET @Start = GETDATE();

    CREATE TABLE #T2

    (ID INT,

    ParentID INT,

    Lvl INT);

    INSERT INTO #T2

    (ID,

    ParentID,

    Lvl)

    SELECT ID,

    ParentID,

    1

    FROM dbo.HierarchyTest

    WHERE ID = 1;

    WHILE @@rowcount > 0

    INSERT INTO #T2

    (ID,

    ParentID,

    Lvl)

    SELECT H2.ID,

    H2.ParentID,

    #T2.Lvl + 1

    FROM dbo.HierarchyTest AS H2

    INNER JOIN #T2

    ON H2.ParentID = #T2.ID

    WHERE H2.ID NOT IN (SELECT ID

    FROM #T2);

    SELECT DATEDIFF(millisecond, @Start, GETDATE()) AS TotalWhileTime;

    I created a test table a while back with a set of complex hierarchies in it. The one that starts with ID 1, used in this test, is 427 levels deep. So the rCTE needs the MaxRecursion option in order to crawl the whole thing. That slows it down, of course.

    In my tests (core i7 Quad, 16 Gig RAM, Windows 7 64-bit, running SQL 2008 R2 Dev Edition), the rCTE averages 6 milliseconds for the whole crawl, including the DDL for the Select Into (used to eliminate data-rendering time). The While loop averages 2296 milliseconds.

    Inserting the While loop version into a table variable, to reduce logging, doesn't materially change the performance (+/- 30 milliseconds compared to temp table). Probably what it gains in lack of logging, it loses in the Where Not In piece of the query (necessary to avoid an infinite loop).

    On smaller hierarchies, the While Loop version is usually "fast enough" that it doesn't matter. But the rCTE will usually be faster on the same datasets. Has been in ever test I've ever done, but that's just my testing, and YMMV always applies.

    As an aside, the same table has HierarchyID datatype information on the same hierarchies. The same test structure gives an average query time of 36 milliseconds when querying that way. Slower than the rCTE for such a simple query, but still far faster than the While Loop method. More complex queries will slow down the rCTE significantly, while Hierarchy will, per my tests, stay about the same. Either will usually remain much faster than a comparable While Loop. Nested Sets will be faster to query, of course. Usually sub-millisecond for most common hierarchy structures.

    How many rows in that hierarchy and is there any chance of seeinng the similar DDL for the table including the indexing directly related to your queries? If you could provide what the average and max fan-out is for your tests, that would help. I'd like to do some additional testing on this and that information might help me in how I design the typical million row test table on something like this. Thanks, Gus.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ariel Nessi (9/13/2012)


    Alright guys,

    Jeff,

    If you don't think duration is a good enough indication of such things, then you can certainly run the code with SQL Profiler turned on.

    That's exactly my point, when I'm looking to tune things up I always looking for duration,

    but I always thought IO counts were pretty much directly associated with overall performance, and therefore, with response time. Following by your examples, the fastest one is the one that doesn't even generate IO and one can perceive an IO -> Duration relation with all the others.

    Now back to my specific case, the rCTE IS faster than the LIKE that's a fact at least for my enviroment, but it skyroofs on IO comparison ! I just would like to understand that behavior.

    Duration is usually what I shoot for because that's all end-users can perceive or actually care about (other than accuracy, of course). I'll even use "extra resources" to improve the end-user experience, if need be. There are other times where I look down the road and see that something should not use extra-resources because of future "load" o the machine even if it does mean better performance in the short term. "It Depends".

    For the IO, as much as I hate to admit it, "It Depends". There's physical IO (disk) and there's logical IO (basically, some form of memory in most cases). As you have found, there's usually a direct correlation between the number of reads something has and what its performance is. That's not always the case (Paul White did a brilliant job on a very high performance Triangular Join, in the past). Still, I also try to limit reads (especially if a large number of reads is involved) because they're either using the disk system or the memory system.

    For why rCTEs appear to use so many more reads than other methods, I have my own suspicions but don't really want to know that much about the "guts" of it because it's not likely that I can change it. 😉 As Paul White once suggested, it may be something as simple as SQL Server considering each row read to being a "read" instead of a measure of how many pages were read.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ariel Nessi (9/11/2012)


    1146 rows, which should not be a problem, but some procedures take up to 4 minutes to run with the like code, and that's pretty bad, my concern is that the rCTE code generates something like 100 table scans with 10k+ logical reads, and I'm a begginer to this but I don't think this much IO is ok, the LIKE code posted does something like 1 scan 68 logical reads, yet performs timewise much much worse and that's where i get lost

    I believe I have a method that will blow the doors off things like that. What are your more lengthy runs doing? For example are you trying to calculate the "rollup cost" of each node?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply