Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Help on Performance issue on Recursive CTE Expand / Collapse
Author
Message
Posted Wednesday, October 10, 2012 12:09 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, June 14, 2013 9:58 AM
Points: 132, Visits: 339
Expert,

whatelse could i do to make this query perform better. this thing runs for about 20seconds on the table that contains more than 10 millions of records, which is not acceptable.

there is a non cluster index on ObjectID
there is a non cluster index on ParentID
parent-to-child is many-to-many relationship.

is cross apply helps on this case, how do do this with cross apply?


CREATE TABLE #TempCTE (ID BIGINT, ParentID uniqueidentifier, ObjectID uniqueidentifier)
INSERT INTO #TempCTE

SELECT 1,convert(uniqueidentifier, 'B4ED3C39-9626-44B6-93D9-000B37BDA575'),convert(uniqueidentifier,'95C1A774-C79B-4861-AAB4-09E8070188FB') UNION ALL
SELECT 2,convert(uniqueidentifier, 'C4ED3C39-9626-44B6-93D9-000B37BDA575'),convert(uniqueidentifier,'95C1A774-C79B-4861-AAB4-09E8070188FB') UNION ALL
SELECT 3,convert(uniqueidentifier, 'B4ED3C39-9626-44B6-93D9-000B37BDA575'),convert(uniqueidentifier,'5A116CF5-BD94-46F7-9533-4576D83CC6B0') UNION ALL
SELECT 4,convert(uniqueidentifier, '95C1A774-C79B-4861-AAB4-09E8070188FB'),convert(uniqueidentifier,'E43FB52B-863D-4D28-95B5-2871CC823721') UNION ALL
SELECT 5,convert(uniqueidentifier, 'E43FB52B-863D-4D28-95B5-2871CC823721'),convert(uniqueidentifier,'3C571CC5-FE52-4453-AFDC-313EBF9CF05F') UNION ALL
SELECT 6,convert(uniqueidentifier, 'E43FB52B-863D-4D28-95B5-2871CC823721'),convert(uniqueidentifier,'68B7C10A-D03E-4F99-9737-6D4CED3B4692') UNION ALL
SELECT 7,convert(uniqueidentifier, 'E43FB52B-863D-4D28-95B5-2871CC823721'),convert(uniqueidentifier,'DD74A988-99F8-4BF7-B884-7DDE14264D0C') UNION ALL
SELECT 8,convert(uniqueidentifier, 'D77FB52B-863D-4D28-95B5-2871CC823721'),convert(uniqueidentifier,'DD74A988-99F8-4BF7-B884-7DDE14264D0C')


--get nodes recursive, this is where slowness be
;WITH TempCTE AS
(
SELECT ParentID, ObjectID, 0 as Depth
FROM #TempCTE
WHERE ParentID='B4ED3C39-9626-44B6-93D9-000B37BDA575'

UNION ALL

SELECT T1.ParentID, T1.ObjectID, Depth + 1
FROM #TempCTE T1
INNER JOIN TempCTE T2 on T2.ObjectID = T1.ParentID

)
SELECT TOP 100 PERCENT *
FROM TempCTE
ORDER BY ParentID ;

DROP TABLE #TempCTE

Post #1371073
Posted Wednesday, October 10, 2012 3:20 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 238, Visits: 1,199
You should post the actual DDL.

From what I see you could make this faster using the Local Updateable Variable approach (e.g. the 'Quirky Update')... See http://www.sqlservercentral.com/articles/T-SQL/68467/


-- AJB
xmlsqlninja.com
Post #1371139
Posted Wednesday, October 10, 2012 3:21 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Today @ 12:41 PM
Points: 28, Visits: 327
A co-worker analyzed CTE performance in SQL Server 2005 and determined it begins to degrade after, if I recall, 100,000 records.

As we encounter performance related issues in production which involve CTE's we rip them out and refactor the proc.

Maybe the performance is better in 2008 R2 or 2012 ...
Post #1371140
Posted Wednesday, October 10, 2012 3:33 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, June 14, 2013 9:58 AM
Points: 132, Visits: 339
XMLSQLNinja (10/10/2012)
You should post the actual DDL.

From what I see you could make this faster using the Local Updateable Variable approach (e.g. the 'Quirky Update')... See http://www.sqlservercentral.com/articles/T-SQL/68467/


Alen,
I updated my original post with complete runable DDL.
Post #1371143
Posted Wednesday, October 10, 2012 3:56 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 238, Visits: 1,199
Homersdonut (10/10/2012)
A co-worker analyzed CTE performance in SQL Server 2005 and determined it begins to degrade after, if I recall, 100,000 records.

As we encounter performance related issues in production which involve CTE's we rip them out and refactor the proc.

Maybe the performance is better in 2008 R2 or 2012 ...


Based on my experience, a will written CTE does not slow down over time - they essentially produce the same Execution plan as a correlated subquery but, IMHO, are easier to write as the code is cleaner.

If your co-worker was talking about Recursive CTEs (rCTEs), I will aver that get a bad rap often without justification. This is one of those "It depends" kinda things. I have tested the use of rCTEs for running totals - if the DDL is good an rCTE is profoundly faster than Cursors or any other iterative method for this type of task. Nonetheless, the 'Quirky Update' is the way to go.


-- AJB
xmlsqlninja.com
Post #1371149
Posted Wednesday, October 10, 2012 3:57 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 238, Visits: 1,199
haiao2000 (10/10/2012)
XMLSQLNinja (10/10/2012)
You should post the actual DDL.

From what I see you could make this faster using the Local Updateable Variable approach (e.g. the 'Quirky Update')... See http://www.sqlservercentral.com/articles/T-SQL/68467/


Alen,
I updated my original post with complete runable DDL.


I'll look at this again later and have a reply later this evening.


-- AJB
xmlsqlninja.com
Post #1371150
Posted Wednesday, October 10, 2012 4:01 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, June 14, 2013 9:58 AM
Points: 132, Visits: 339
XMLSQLNinja (10/10/2012)
haiao2000 (10/10/2012)
XMLSQLNinja (10/10/2012)
You should post the actual DDL.

From what I see you could make this faster using the Local Updateable Variable approach (e.g. the 'Quirky Update')... See http://www.sqlservercentral.com/articles/T-SQL/68467/


Alen,
I updated my original post with complete runable DDL.


I'll look at this again later and have a reply later this evening.


Allen,
I am still reading the Quirky Update post and hope to understand it. Could you see if you can show me how to do this recursive in a Quirky Update way?

Thanks in advance.
Post #1371153
Posted Wednesday, October 10, 2012 4:32 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 1:08 PM
Points: 1,334, Visits: 4,020
Hi there, I would like to offer you my own twist on this using the "identity hack".

The outline of the logic is to replace the recusive cte with a recursive INSERT managed by a WHILE statement and the "identity hack".

First you are going to need somewhere to store the results


IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL
DROP TABLE #RESULTS;

CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL);

CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID);


The results table matches your source data, with an extra identity column which has an index with an included column of childid.

Next you need to allow identity inserts on the results table...


SET IDENTITY_INSERT #RESULTS ON;


And pop the head of the list into the results with a "Level" of 1.


INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT ID,PARENTID,CHILDID,1
FROM #SOURCE
WHERE PARENTID=@START;


Now for the "recursive" bit


WHILE @@ROWCOUNT>0
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1
FROM #RESULTS AS R
JOIN #SOURCE AS S
ON S.PARENTID = R.CHILDID
WHERE LEVEL = SCOPE_IDENTITY();


Explanation of this code:

WHILE @@ROWCOUNT>0 just repeats the INSERT until it processes zero rows (you have run out of CHILDREN)

The INSERT, uses SCOPE_IDENTITY two ways.

1. to SELECT all the ROWS that were last added to the results table
2. to INSERT the new Level value (last value + 1)

Finally, turn off identity insert and select your results.


SET IDENTITY_INSERT #RESULTS OFF;

SELECT *
FROM #RESULTS;


Using this approach I can "walk the tree" for a four level (44000 results) setup in a 4.5 Million row source table in seconds, whereas the recursive CTE takes too long to bother timing it.

Try it out, see if it helps.


MM




Post #1371159
Posted Wednesday, October 10, 2012 7:14 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, June 14, 2013 9:58 AM
Points: 132, Visits: 339
mister.magoo (10/10/2012)
Hi there, I would like to offer you my own twist on this using the "identity hack".

The outline of the logic is to replace the recusive cte with a recursive INSERT managed by a WHILE statement and the "identity hack".

First you are going to need somewhere to store the results


IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL
DROP TABLE #RESULTS;

CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL);

CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID);


The results table matches your source data, with an extra identity column which has an index with an included column of childid.

Next you need to allow identity inserts on the results table...


SET IDENTITY_INSERT #RESULTS ON;


And pop the head of the list into the results with a "Level" of 1.


INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT ID,PARENTID,CHILDID,1
FROM #SOURCE
WHERE PARENTID=@START;


Now for the "recursive" bit


WHILE @@ROWCOUNT>0
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1
FROM #RESULTS AS R
JOIN #SOURCE AS S
ON S.PARENTID = R.CHILDID
WHERE LEVEL = SCOPE_IDENTITY();


Explanation of this code:

WHILE @@ROWCOUNT>0 just repeats the INSERT until it processes zero rows (you have run out of CHILDREN)

The INSERT, uses SCOPE_IDENTITY two ways.

1. to SELECT all the ROWS that were last added to the results table
2. to INSERT the new Level value (last value + 1)

Finally, turn off identity insert and select your results.


SET IDENTITY_INSERT #RESULTS OFF;

SELECT *
FROM #RESULTS;


Using this approach I can "walk the tree" for a four level (44000 results) setup in a 4.5 Million row source table in seconds, whereas the recursive CTE takes too long to bother timing it.

Try it out, see if it helps.


Holycow! Mister.magoo, you will never know how much value your post is to me, it may mean nothing to someone but to me it is a huge thing. My CTE recursive runs in 18-20 seconds which is not acceptable, this sh't runs less than a second that is too wicked fast.

I really appreciate that you spend time write up this great post. I am still trying to understand how your code works myself .
The part I don't understand is why LEVEL INT IDENTITY(1,1) is set to auto increment, then turn identity insert off when inserting?
another one is you said "SELECT all the ROWS that were last added" does that mean if after looping 5 times, the 6th iteration will select last 5 rows from #RESULT?

BTW, why do you call this method is a Identity Hack? Where do you learn this type of woodo?

Updated:
If I run these statements before I run the query, it still takes 15 seconds each...so I guess it is not as fast as I thought, or did I do something wrong?

DBCC DROPCLEANBUFFERS --clears all data from the cache.
DBCC FREEPROCCACHE --clears the stored procedure cache.
Post #1371173
Posted Thursday, October 11, 2012 12:44 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 1:08 PM
Points: 1,334, Visits: 4,020
haiao2000 (10/10/2012)


I really appreciate that you spend time write up this great post. I am still trying to understand how your code works myself .
The part I don't understand is why LEVEL INT IDENTITY(1,1) is set to auto increment, then turn identity insert off when inserting?


I need an identity column just so I can use SCOPE_IDENTITY(), not because I want it to auto-increment, so I declare it as an identity, then "turn it off"...


another one is you said "SELECT all the ROWS that were last added" does that mean if after looping 5 times, the 6th iteration will select last 5 rows from #RESULT?


Not the last 5 rows, but it does mean it will select the rows that were added in iteration #5 because they are all tagged with Level 5


BTW, why do you call this method is a Identity Hack? Where do you learn this type of woodo?


I call it "Identity Hack" because I am using an identity column as a way of passing a value from one INSERT statement to the next without use of variables, which is not it's intended purpose - hence Identity Hack.

Where woodoo taught? Look around you - Woodoo - she everywhere !



Updated:
If I run these statements before I run the query, it still takes 15 seconds each...so I guess it is not as fast as I thought, or did I do something wrong?

DBCC DROPCLEANBUFFERS --clears all data from the cache.
DBCC FREEPROCCACHE --clears the stored procedure cache.


I would imagine that you have indexing problems if this is taking roughly as long as the R-CTE with clean buffers/cache.

Can you post an actual execution plan (the xml/sqlplan type) along with the output from STATISTICS TIME and IO (to see where the pain is)


MM




Post #1371231
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse