rCTE vs LIKE for Hierarchy

  • Hi people,

    I'm tunning a database and i've stumbled by some hierarchy queries, on the good old form of ID, ParentID, to work out the hierarchy the previous developers used a like logic as in

    SELECT TB1.* FROM TABLE1 as TB1 ON TABLE1 as TB2 on TB2.Hierarchy + '.' LIKE TB1.Hierarchy + '.%'

    where hierarchy field contains the "path" to the parent nodes for example

    1.2.3.4.5 would mean, this row is 5, parent is 4, grandparent is 3 and so forth

    i've rewritten that in a CTE, and the query plan shouts out at me that it takes something like 25% of the cost in batch when running the rCTE approach and the like approach ( which would lead me to think that the rCTE is by far faster than the original ) but the like clause seems to take way lesser reads, i've read somewhere that the query plan only looks at the cost of a first execution of a rCTE, but as I understand rCTE were meant for jobs like this one, so how can a nasty non-sargable like go by less reads than a rCTE ?

    sry if the post is not clear enough,

    I haven't posted the query plans in the belief that i'm just misunderstanding something, if it ain't the case, i'll promptly post those

    Edit : As requested, here is the DDL of the tables (Please note, i've took away field that are irrelevant for the problem logic)

    --Table DDL

    Create Table #Hierarchy

    (

    [OUID] [int] Primary Key,

    [OUParentID] [int] NULL,

    [CustomerID] [int] NOT NULL,

    [Enabled] [bit] NULL,

    [Hierarchy] [varchar](256) NULL,

    )

    --Sample Insert of 26 rows

    SELECT '364',NULL,'14','1','364' UNION ALL

    SELECT '365','364','14','1','364.365' UNION ALL

    SELECT '366','365','14','1','364.365.366' UNION ALL

    SELECT '367','366','14','1','364.365.366.367' UNION ALL

    SELECT '368','367','14','1','364.365.366.367.368' UNION ALL

    SELECT '369','367','14','1','364.365.366.367.369' UNION ALL

    SELECT '370','367','14','1','364.365.366.367.370' UNION ALL

    SELECT '371','364','14','1','364.371' UNION ALL

    SELECT '372','371','14','1','364.371.372' UNION ALL

    SELECT '373','372','14','1','364.371.372.373' UNION ALL

    SELECT '374','373','14','1','364.371.372.373.374' UNION ALL

    SELECT '375','374','14','1','364.371.372.373.374.375' UNION ALL

    SELECT '376','372','14','1','364.371.372.376' UNION ALL

    SELECT '377','376','14','1','364.371.372.376.377' UNION ALL

    SELECT '378','377','14','1','364.371.372.376.377.378' UNION ALL

    SELECT '379','372','14','1','364.371.372.379' UNION ALL

    SELECT '380','379','14','1','364.371.372.379.380' UNION ALL

    SELECT '381','380','14','1','364.371.372.379.380.381' UNION ALL

    SELECT '382','372','14','1','364.371.372.382' UNION ALL

    SELECT '383','382','14','1','364.371.372.382.383' UNION ALL

    SELECT '384','383','14','1','364.371.372.382.383.384' UNION ALL

    SELECT '385','372','14','1','364.371.372.385' UNION ALL

    SELECT '386','385','14','1','364.371.372.385.386' UNION ALL

    SELECT '387','386','14','1','364.371.372.385.386.387' UNION ALL

    SELECT '388','372','14','1','364.371.372.388' UNION ALL

    SELECT '389','388','14','1','364.371.372.388.389'

    --Original Query Approach

    DECLARE @pCustomerID int,

    @pUserID int,

    @pOUIDs varchar(max),

    @pLocationIds varchar(max)

    SET@pCustomerID = 14

    SET@pUserID = 1

    SET@pOUIDs = NULL

    SET@pLocationIDs = NULL

    SELECT DISTINCT ou.OUID

    FROM

    (SELECT

    ouChild.OUID,

    ouChild.CustomerID,

    ouChild.Enabled,

    ou.OUID as OUIDAncestor

    FROM #Hierarchy ou

    INNER JOIN #Hierarchy ouChild ON ouChild.Hierarchy + '.' LIKE ou.Hierarchy + '.%'

    )ou

    WHERE

    ou.CustomerID = @pCustomerID

    AND ou.Enabled = 1

    AND

    (

    @pOUIDs IS NULL

    OR

    @pOUIDs LIKE '%,' + CAST(OUIDAncestor AS VARCHAR(20)) + ',%'

    );

    -- CTE Approach

    WITH LocationHierarchy As

    (

    SELECT

    ouf.OUID,ouf.OUParentID,ouf.CustomerID,ouf.Hierarchy,ouf.Enabled

    FROM

    #Hierarchy ouf

    Where

    OUID in (select Value from dbo.SplitString(@pOUIDs,',')) or @pOUIDs is null

    UNION ALL

    SELECT

    ouc.OUID,ouc.OUParentID,ouc.CustomerID,ouc.Hierarchy,ouc.Enabled

    FROM

    #Hierarchy ouc INNER JOIN

    LocationHierarchy on ouc.OUParentID = LocationHierarchy.OUID

    Where

    ouc.Enabled = 1 and ouc.CustomerID = @pCustomerID

    )

    Select

    DISTINCT ou.OUID

    from

    LocationHierarchy ou

  • ariel_mlk (9/4/2012)


    Hi people,

    I'm tunning a database and i've stumbled by some hierarchy queries, on the good old form of ID, ParentID, to work out the hierarchy the previous developers used a like logic as in

    SELECT TB1.* FROM TABLE1 as TB1 ON TABLE1 as TB2 on TB2.Hierarchy + '.' LIKE TB1.Hierarchy + '.%'

    where hierarchy field contains the "path" to the parent nodes for example

    1.2.3.4.5 would mean, this row is 5, parent is 4, grandparent is 3 and so forth

    i've rewritten that in a CTE, and the query plan shouts out at me that it takes something like 25% of the cost in batch when running the rCTE approach and the like approach ( which would lead me to think that the rCTE is by far faster than the original ) but the like clause seems to take way lesser reads, i've read somewhere that the query plan only looks at the cost of a first execution of a rCTE, but as I understand rCTE were meant for jobs like this one, so how can a nasty non-sargable like go by less reads than a rCTE ?

    sry if the post is not clear enough,

    I haven't posted the query plans in the belief that i'm just misunderstanding something, if it ain't the case, i'll promptly post those

    God that is awful. At least they are former developers so they won't continue creating stuff like that for you to work with. That is an adjacency list that is denormalized. I would start by getting rid of anything other than the immediate parent. The rest of that is nothing but pain. Once you start joining on like '%' you are in for a horrible performance. Say goodbye to indexing.

    I tossed together some very simplified ddl and and sample data to see if your problem is what I think it is.

    ;with BadAdj ( KeyVal, SomeVal, ParentKeyVal, AwfulPath)

    as

    (

    select 1, 'Top Dog', null, '1' union all

    select 2, 'Next Dog', 1, '1.2' union all

    select 3, 'Grandchild', 2, '1.2.3' union all

    select 4, 'Uber Grandchild', 3, '1.2.3.4' union all

    select 5, 'Step Grandchild', 4, '1.2.3.4.5'

    )

    select * from BadAdj

    If I understand what you are facing...you have something like AwfulPath instead of the simple ParentKeyVal? If so, do you have the ability to change the ddl into something more usable? Even if you have to leave the whole path adding a new column to identify the parent would go a long way to making this easier to work with.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Celko and Sean,

    Thanks for the responses, I'm sorry i don't think i was clear enough, what I have is something like

    ID ParentID Hierarchy

    1 NULL 1

    2 1 1.2

    3 2 1.2.3

    4 1 1.4

    ...

    what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :

    SELECT

    stChild.OUID,

    stChild.OUParentID,

    stChild.CustomerID,

    stChild.Name,

    stChild.NameFull,

    stChild.Hierarchy,

    stChild.HierarchyDepth,

    stChild.Enabled,

    st.OUID as OUIDAncestor

    FROM SomeTable st

    INNER JOIN SomeTable stChild ON stChild.Hierarchy + '.' LIKE st.Hierarchy + '.%'

    produces less IO and CPU time than a cookie cutter rcte having a anchor on top-level rows and recursing on childs

    Edit : just setting sql code in the right container ( was in a quote )

  • ariel_mlk (9/5/2012)


    Celko and Sean,

    Thanks for the responses, I'm sorry i don't think i was clear enough, what I have is something like

    ID ParentID Hierarchy

    1 NULL 1

    2 1 1.2

    3 2 1.2.3

    4 1 1.4

    ...

    what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :

    SELECT

    stChild.OUID,

    stChild.OUParentID,

    stChild.CustomerID,

    stChild.Name,

    stChild.NameFull,

    stChild.Hierarchy,

    stChild.HierarchyDepth,

    stChild.Enabled,

    st.OUID as OUIDAncestor

    FROM SomeTable st

    INNER JOIN SomeTable stChild ON stChild.Hierarchy + '.' LIKE st.Hierarchy + '.%'

    produces less IO and CPU time than a cookie cutter rcte having a anchor on top-level rows and recursing on childs

    Edit : just setting sql code in the right container ( was in a quote )

    Without ddl and sample data there is no way to know what is going on. If you need to know how to post that see the first link in my signature.

    If you have the parentid as a column you really should join like I demonstrated. The way you are joining using like is going to cause this to be horribly slow.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :

    It may require a little more IO and CPU but overall it will be WAY faster because it doesn't have to do a full scan just to join the data. The way you are doing with ID to ParentID will allow for the index on those columns to be used in your query. I am no guru when it comes to how the query engine works but I can say that your version will be far superior to the "Like join" approach. If you want it to go even faster you might consider moving to the nested sets model that CELKO talked about.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean,

    Sorry to not have posted that back in the first post, as i stated I though I was missing something obvious enough to discard ddl, i editted it with more information on the problem. On that small set the query optimizer will even identify the rCTE with as the most expensive piece, but when set gets bigger it shifts gears, but again, the rCTE has more IO/CPU on a Set Statistics IO/TIME ON comparison. So my question really is how can it be more IO/CPU time consuming and be rated as more performatic by the Actual Plan percentage?

  • ariel_mlk (9/5/2012)


    Sean,

    Sorry to not have posted that back in the first post, as i stated I though I was missing something obvious enough to discard ddl, i editted it with more information on the problem. On that small set the query optimizer will even identify the rCTE with as the most expensive piece, but when set gets bigger it shifts gears, but again, the rCTE has more IO/CPU on a Set Statistics IO/TIME ON comparison. So my question really is how can it be more IO/CPU time consuming and be rated as more performatic by the Actual Plan percentage?

    Thanks for the ddl and sample data. There is a critical piece in your cte that is NOT in the original query. You have the SplitString function in the cte. Can you post the code for that? I have a feeling I may know why your performance is taking a hit.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ALTER FUNCTION [dbo].[SplitString]

    (

    @String VARCHAR(MAX),

    @Delimeter Char(1)

    )

    RETURNS @RtnValue TABLE

    (

    Value VARCHAR(100)

    )

    AS

    BEGIN

    DECLARE @Cnt INT

    SET @Cnt = 1

    WHILE (CHARINDEX(@Delimeter, @String) > 0)

    BEGIN

    INSERT INTO @RtnValue (Value)

    SELECT

    Data = ltrim(rtrim(Substring(@String,1,Charindex(@Delimeter,@String)-1)))

    SET @String = Substring(@String,Charindex(@Delimeter,@String)+1,len(@String))

    SET @Cnt = @Cnt + 1

    END

    INSERT INTO @RtnValue (Value)

    SELECT Data = ltrim(rtrim(@String))

    RETURN

    END

    I'm aware this is RBAR and i'm aware of tally techinics but the strings that this functions breaks are usually of something about 200 characters? but mostly it comes as a null, can that make enourmous IO counts?

  • ariel_mlk (9/5/2012)


    ALTER FUNCTION [dbo].[SplitString]

    (

    @String VARCHAR(MAX),

    @Delimeter Char(1)

    )

    RETURNS @RtnValue TABLE

    (

    Value VARCHAR(100)

    )

    AS

    BEGIN

    DECLARE @Cnt INT

    SET @Cnt = 1

    WHILE (CHARINDEX(@Delimeter, @String) > 0)

    BEGIN

    INSERT INTO @RtnValue (Value)

    SELECT

    Data = ltrim(rtrim(Substring(@String,1,Charindex(@Delimeter,@String)-1)))

    SET @String = Substring(@String,Charindex(@Delimeter,@String)+1,len(@String))

    SET @Cnt = @Cnt + 1

    END

    INSERT INTO @RtnValue (Value)

    SELECT Data = ltrim(rtrim(@String))

    RETURN

    END

    I'm aware this is RBAR and i'm aware of tally techinics but the strings that this functions breaks are usually of something about 200 characters? but mostly it comes as a null, can that make enourmous IO counts?

    No offense but if you are aware that this is subpar for performance why not change it out? This process and every other process that uses your splitter will gain an advantage.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean,

    No offense taken at all ! right now the procedure that i copied here is the only one that uses it, i just did some copy paste of existing code ( yeah I was lazy ) as this is still not in production, but it will be changed, i'm just yet looking for where the rCTE IO comes from right now, or if I'm misinterpreting results.

    I don't have the actual numbers right now but the rCTE does something like a hundred table scans on the #hierarchy table ! the like approach doesn't gets even near that and still is outperformed by the rCTE. Right now I'm more worried on understanding that behavior than using a tally for string split, which I *think* will cause minor improvement on this, or RBARs can really be THAT bad on, such small strings (200 is small isn't it) ?

    thanks for the attention so far =)

  • ariel_mlk (9/5/2012)


    Sean,

    No offense taken at all ! right now the procedure that i copied here is the only one that uses it, i just did some copy paste of existing code ( yeah I was lazy ) as this is still not in production, but it will be changed, i'm just yet looking for where the rCTE IO comes from right now, or if I'm misinterpreting results.

    I don't have the actual numbers right now but the rCTE does something like a hundred table scans on the #hierarchy table ! the like approach doesn't gets even near that and still is outperformed by the rCTE. Right now I'm more worried on understanding that behavior than using a tally for string split, which I *think* will cause minor improvement on this, or RBARs can really be THAT bad on, such small strings (200 is small isn't it) ?

    thanks for the attention so far =)

    Get rid of that RBAR function and then start your comparison. You are trying to read stats on a query that you know you need to drastically change. Change the query first and then look at what it is doing. I realize that with the code you posted it isn't doing anything but still...

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean,

    Thanks for the input ! i'll get that done by tomorrow and come back to buzz you guys again =)

    hopefully i'll improve the procedure more than i expect it to

  • ariel_mlk (9/5/2012)


    Sean,

    Thanks for the input ! i'll get that done by tomorrow and come back to buzz you guys again =)

    hopefully i'll improve the procedure more than i expect it to

    FWIW those two queries do not produce the same results.

    Consider the following when your string to split is not null.

    --Original Query Approach

    DECLARE @pCustomerID int,

    @pUserID int,

    @pOUIDs varchar(max) = '367',

    @pLocationIds varchar(max)

    SET@pCustomerID = 14

    SET@pUserID = 1

    --SET@pOUIDs = NULL

    SET@pLocationIDs = NULL

    SELECT DISTINCT ou.OUID

    FROM

    (SELECT

    ouChild.OUID,

    ouChild.CustomerID,

    ouChild.Enabled,

    ou.OUID as OUIDAncestor

    FROM #Hierarchy ou

    INNER JOIN #Hierarchy ouChild ON ouChild.Hierarchy + '.' LIKE ou.Hierarchy + '.%'

    )ou

    WHERE

    ou.CustomerID = @pCustomerID

    AND ou.Enabled = 1

    AND

    (

    @pOUIDs IS NULL

    OR

    @pOUIDs LIKE '%,' + CAST(OUIDAncestor AS VARCHAR(20)) + ',%'

    );

    -- CTE Approach

    ;WITH LocationHierarchy As

    (

    SELECT

    ouf.OUID,ouf.OUParentID,ouf.CustomerID,ouf.Hierarchy,ouf.Enabled

    FROM

    #Hierarchy ouf

    Where

    OUID in (select val from dbo.Split(@pOUIDs,',')) or @pOUIDs is null

    UNION ALL

    SELECT

    ouc.OUID,ouc.OUParentID,ouc.CustomerID,ouc.Hierarchy,ouc.Enabled

    FROM

    #Hierarchy ouc INNER JOIN

    LocationHierarchy on ouc.OUParentID = LocationHierarchy.OUID

    Where

    ouc.Enabled = 1 and ouc.CustomerID = @pCustomerID

    )

    Select

    DISTINCT ou.OUID

    from

    LocationHierarchy ou

    The results are different in the two queries so your comparison is even further off track. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean,

    that would produce different results indeed, but the old code would expect ',367,' to work properly, i can't test it right now, but i assume that with those leading and ending comma's they produce the same output, or i have screwed up on the copy paste.

  • How many rows in this hierarchy?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply