Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

rCTE vs LIKE for Hierarchy Expand / Collapse
Author
Message
Posted Tuesday, September 4, 2012 1:35 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, February 14, 2014 8:14 PM
Points: 32, Visits: 184
Hi people,

I'm tunning a database and i've stumbled by some hierarchy queries, on the good old form of ID, ParentID, to work out the hierarchy the previous developers used a like logic as in

SELECT TB1.* FROM TABLE1 as TB1 ON TABLE1 as TB2 on TB2.Hierarchy + '.' LIKE TB1.Hierarchy + '.%'

where hierarchy field contains the "path" to the parent nodes for example
1.2.3.4.5 would mean, this row is 5, parent is 4, grandparent is 3 and so forth

i've rewritten that in a CTE, and the query plan shouts out at me that it takes something like 25% of the cost in batch when running the rCTE approach and the like approach ( which would lead me to think that the rCTE is by far faster than the original ) but the like clause seems to take way lesser reads, i've read somewhere that the query plan only looks at the cost of a first execution of a rCTE, but as I understand rCTE were meant for jobs like this one, so how can a nasty non-sargable like go by less reads than a rCTE ?

sry if the post is not clear enough,
I haven't posted the query plans in the belief that i'm just misunderstanding something, if it ain't the case, i'll promptly post those

Edit : As requested, here is the DDL of the tables (Please note, i've took away field that are irrelevant for the problem logic)

--Table DDL
Create Table #Hierarchy
(
[OUID] [int] Primary Key,
[OUParentID] [int] NULL,
[CustomerID] [int] NOT NULL,
[Enabled] [bit] NULL,
[Hierarchy] [varchar](256) NULL,
)
--Sample Insert of 26 rows
SELECT '364',NULL,'14','1','364' UNION ALL
SELECT '365','364','14','1','364.365' UNION ALL
SELECT '366','365','14','1','364.365.366' UNION ALL
SELECT '367','366','14','1','364.365.366.367' UNION ALL
SELECT '368','367','14','1','364.365.366.367.368' UNION ALL
SELECT '369','367','14','1','364.365.366.367.369' UNION ALL
SELECT '370','367','14','1','364.365.366.367.370' UNION ALL
SELECT '371','364','14','1','364.371' UNION ALL
SELECT '372','371','14','1','364.371.372' UNION ALL
SELECT '373','372','14','1','364.371.372.373' UNION ALL
SELECT '374','373','14','1','364.371.372.373.374' UNION ALL
SELECT '375','374','14','1','364.371.372.373.374.375' UNION ALL
SELECT '376','372','14','1','364.371.372.376' UNION ALL
SELECT '377','376','14','1','364.371.372.376.377' UNION ALL
SELECT '378','377','14','1','364.371.372.376.377.378' UNION ALL
SELECT '379','372','14','1','364.371.372.379' UNION ALL
SELECT '380','379','14','1','364.371.372.379.380' UNION ALL
SELECT '381','380','14','1','364.371.372.379.380.381' UNION ALL
SELECT '382','372','14','1','364.371.372.382' UNION ALL
SELECT '383','382','14','1','364.371.372.382.383' UNION ALL
SELECT '384','383','14','1','364.371.372.382.383.384' UNION ALL
SELECT '385','372','14','1','364.371.372.385' UNION ALL
SELECT '386','385','14','1','364.371.372.385.386' UNION ALL
SELECT '387','386','14','1','364.371.372.385.386.387' UNION ALL
SELECT '388','372','14','1','364.371.372.388' UNION ALL
SELECT '389','388','14','1','364.371.372.388.389'
--Original Query Approach
DECLARE @pCustomerID int,
@pUserID int,
@pOUIDs varchar(max),
@pLocationIds varchar(max)
SET @pCustomerID = 14
SET @pUserID = 1
SET @pOUIDs = NULL
SET @pLocationIDs = NULL
SELECT DISTINCT ou.OUID
FROM
(SELECT
ouChild.OUID,
ouChild.CustomerID,
ouChild.Enabled,
ou.OUID as OUIDAncestor
FROM #Hierarchy ou
INNER JOIN #Hierarchy ouChild ON ouChild.Hierarchy + '.' LIKE ou.Hierarchy + '.%'
)ou
WHERE
ou.CustomerID = @pCustomerID
AND ou.Enabled = 1
AND
(
@pOUIDs IS NULL
OR
@pOUIDs LIKE '%,' + CAST(OUIDAncestor AS VARCHAR(20)) + ',%'
);
-- CTE Approach
WITH LocationHierarchy As
(
SELECT
ouf.OUID,ouf.OUParentID,ouf.CustomerID,ouf.Hierarchy,ouf.Enabled
FROM
#Hierarchy ouf
Where
OUID in (select Value from dbo.SplitString(@pOUIDs,',')) or @pOUIDs is null
UNION ALL
SELECT
ouc.OUID,ouc.OUParentID,ouc.CustomerID,ouc.Hierarchy,ouc.Enabled
FROM
#Hierarchy ouc INNER JOIN
LocationHierarchy on ouc.OUParentID = LocationHierarchy.OUID
Where
ouc.Enabled = 1 and ouc.CustomerID = @pCustomerID
)
Select
DISTINCT ou.OUID
from
LocationHierarchy ou

Post #1354147
Posted Tuesday, September 4, 2012 2:38 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 6:51 AM
Points: 13,093, Visits: 12,570
ariel_mlk (9/4/2012)
Hi people,

I'm tunning a database and i've stumbled by some hierarchy queries, on the good old form of ID, ParentID, to work out the hierarchy the previous developers used a like logic as in

SELECT TB1.* FROM TABLE1 as TB1 ON TABLE1 as TB2 on TB2.Hierarchy + '.' LIKE TB1.Hierarchy + '.%'

where hierarchy field contains the "path" to the parent nodes for example
1.2.3.4.5 would mean, this row is 5, parent is 4, grandparent is 3 and so forth

i've rewritten that in a CTE, and the query plan shouts out at me that it takes something like 25% of the cost in batch when running the rCTE approach and the like approach ( which would lead me to think that the rCTE is by far faster than the original ) but the like clause seems to take way lesser reads, i've read somewhere that the query plan only looks at the cost of a first execution of a rCTE, but as I understand rCTE were meant for jobs like this one, so how can a nasty non-sargable like go by less reads than a rCTE ?

sry if the post is not clear enough,
I haven't posted the query plans in the belief that i'm just misunderstanding something, if it ain't the case, i'll promptly post those


God that is awful. At least they are former developers so they won't continue creating stuff like that for you to work with. That is an adjacency list that is denormalized. I would start by getting rid of anything other than the immediate parent. The rest of that is nothing but pain. Once you start joining on like '%' you are in for a horrible performance. Say goodbye to indexing.

I tossed together some very simplified ddl and and sample data to see if your problem is what I think it is.

;with BadAdj ( KeyVal, SomeVal, ParentKeyVal, AwfulPath)
as
(
select 1, 'Top Dog', null, '1' union all
select 2, 'Next Dog', 1, '1.2' union all
select 3, 'Grandchild', 2, '1.2.3' union all
select 4, 'Uber Grandchild', 3, '1.2.3.4' union all
select 5, 'Step Grandchild', 4, '1.2.3.4.5'
)

select * from BadAdj

If I understand what you are facing...you have something like AwfulPath instead of the simple ParentKeyVal? If so, do you have the ability to change the ddl into something more usable? Even if you have to leave the whole path adding a new column to identify the parent would go a long way to making this easier to work with.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1354179
Posted Tuesday, September 4, 2012 5:43 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 7:59 PM
Points: 1,945, Visits: 3,080
Can you re-write this as a nested sets model? The bad news is that when you write the cursor to get the old stuff loaded, you often find out that the hierarchy was screwed up.

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1354267
Posted Wednesday, September 5, 2012 11:16 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, February 14, 2014 8:14 PM
Points: 32, Visits: 184
Celko and Sean,

Thanks for the responses, I'm sorry i don't think i was clear enough, what I have is something like

ID ParentID Hierarchy
1 NULL 1
2 1 1.2
3 2 1.2.3
4 1 1.4
...

what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :
SELECT 
stChild.OUID,
stChild.OUParentID,
stChild.CustomerID,
stChild.Name,
stChild.NameFull,
stChild.Hierarchy,
stChild.HierarchyDepth,
stChild.Enabled,
st.OUID as OUIDAncestor
FROM SomeTable st
INNER JOIN SomeTable stChild ON stChild.Hierarchy + '.' LIKE st.Hierarchy + '.%'

produces less IO and CPU time than a cookie cutter rcte having a anchor on top-level rows and recursing on childs

Edit : just setting sql code in the right container ( was in a quote )
Post #1354773
Posted Wednesday, September 5, 2012 12:19 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 6:51 AM
Points: 13,093, Visits: 12,570
ariel_mlk (9/5/2012)
Celko and Sean,

Thanks for the responses, I'm sorry i don't think i was clear enough, what I have is something like

ID ParentID Hierarchy
1 NULL 1
2 1 1.2
3 2 1.2.3
4 1 1.4
...

what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :
SELECT 
stChild.OUID,
stChild.OUParentID,
stChild.CustomerID,
stChild.Name,
stChild.NameFull,
stChild.Hierarchy,
stChild.HierarchyDepth,
stChild.Enabled,
st.OUID as OUIDAncestor
FROM SomeTable st
INNER JOIN SomeTable stChild ON stChild.Hierarchy + '.' LIKE st.Hierarchy + '.%'

produces less IO and CPU time than a cookie cutter rcte having a anchor on top-level rows and recursing on childs

Edit : just setting sql code in the right container ( was in a quote )


Without ddl and sample data there is no way to know what is going on. If you need to know how to post that see the first link in my signature.

If you have the parentid as a column you really should join like I demonstrated. The way you are joining using like is going to cause this to be horribly slow.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1354807
Posted Wednesday, September 5, 2012 12:25 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 6:51 AM
Points: 13,093, Visits: 12,570

what first querie does is a like on Hierarchy column, what mine does is a rcte from ID to ParentID pretty much like BOL examples... what I really don't understand is why query optimizer tells me the rcte is cheaper execution wise whilst it produces way more IO and CPU time than the like approach, by that i mean that this :


It may require a little more IO and CPU but overall it will be WAY faster because it doesn't have to do a full scan just to join the data. The way you are doing with ID to ParentID will allow for the index on those columns to be used in your query. I am no guru when it comes to how the query engine works but I can say that your version will be far superior to the "Like join" approach. If you want it to go even faster you might consider moving to the nested sets model that CELKO talked about.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1354813
Posted Wednesday, September 5, 2012 1:34 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, February 14, 2014 8:14 PM
Points: 32, Visits: 184
Sean,
Sorry to not have posted that back in the first post, as i stated I though I was missing something obvious enough to discard ddl, i editted it with more information on the problem. On that small set the query optimizer will even identify the rCTE with as the most expensive piece, but when set gets bigger it shifts gears, but again, the rCTE has more IO/CPU on a Set Statistics IO/TIME ON comparison. So my question really is how can it be more IO/CPU time consuming and be rated as more performatic by the Actual Plan percentage?
Post #1354853
Posted Wednesday, September 5, 2012 1:39 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 6:51 AM
Points: 13,093, Visits: 12,570
ariel_mlk (9/5/2012)
Sean,
Sorry to not have posted that back in the first post, as i stated I though I was missing something obvious enough to discard ddl, i editted it with more information on the problem. On that small set the query optimizer will even identify the rCTE with as the most expensive piece, but when set gets bigger it shifts gears, but again, the rCTE has more IO/CPU on a Set Statistics IO/TIME ON comparison. So my question really is how can it be more IO/CPU time consuming and be rated as more performatic by the Actual Plan percentage?


Thanks for the ddl and sample data. There is a critical piece in your cte that is NOT in the original query. You have the SplitString function in the cte. Can you post the code for that? I have a feeling I may know why your performance is taking a hit.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1354859
Posted Wednesday, September 5, 2012 1:45 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, February 14, 2014 8:14 PM
Points: 32, Visits: 184
ALTER FUNCTION [dbo].[SplitString]
(
@String VARCHAR(MAX),
@Delimeter Char(1)
)
RETURNS @RtnValue TABLE
(
Value VARCHAR(100)
)
AS
BEGIN
DECLARE @Cnt INT
SET @Cnt = 1

WHILE (CHARINDEX(@Delimeter, @String) > 0)
BEGIN
INSERT INTO @RtnValue (Value)
SELECT
Data = ltrim(rtrim(Substring(@String,1,Charindex(@Delimeter,@String)-1)))

SET @String = Substring(@String,Charindex(@Delimeter,@String)+1,len(@String))
SET @Cnt = @Cnt + 1
END

INSERT INTO @RtnValue (Value)
SELECT Data = ltrim(rtrim(@String))

RETURN
END


I'm aware this is RBAR and i'm aware of tally techinics but the strings that this functions breaks are usually of something about 200 characters? but mostly it comes as a null, can that make enourmous IO counts?
Post #1354864
Posted Wednesday, September 5, 2012 2:39 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 6:51 AM
Points: 13,093, Visits: 12,570
ariel_mlk (9/5/2012)
ALTER FUNCTION [dbo].[SplitString]
(
@String VARCHAR(MAX),
@Delimeter Char(1)
)
RETURNS @RtnValue TABLE
(
Value VARCHAR(100)
)
AS
BEGIN
DECLARE @Cnt INT
SET @Cnt = 1

WHILE (CHARINDEX(@Delimeter, @String) > 0)
BEGIN
INSERT INTO @RtnValue (Value)
SELECT
Data = ltrim(rtrim(Substring(@String,1,Charindex(@Delimeter,@String)-1)))

SET @String = Substring(@String,Charindex(@Delimeter,@String)+1,len(@String))
SET @Cnt = @Cnt + 1
END

INSERT INTO @RtnValue (Value)
SELECT Data = ltrim(rtrim(@String))

RETURN
END


I'm aware this is RBAR and i'm aware of tally techinics but the strings that this functions breaks are usually of something about 200 characters? but mostly it comes as a null, can that make enourmous IO counts?


No offense but if you are aware that this is subpar for performance why not change it out? This process and every other process that uses your splitter will gain an advantage.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1354897
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse