October 10, 2012 at 12:09 pm
Expert,
whatelse could i do to make this query perform better. this thing runs for about 20seconds on the table that contains more than 10 millions of records, which is not acceptable.
there is a non cluster index on ObjectID
there is a non cluster index on ParentID
parent-to-child is many-to-many relationship.
is cross apply helps on this case, how do do this with cross apply?
CREATE TABLE #TempCTE (ID BIGINT, ParentID uniqueidentifier, ObjectID uniqueidentifier)
INSERT INTO #TempCTE
SELECT 1,convert(uniqueidentifier, 'B4ED3C39-9626-44B6-93D9-000B37BDA575'),convert(uniqueidentifier,'95C1A774-C79B-4861-AAB4-09E8070188FB') UNION ALL
SELECT 2,convert(uniqueidentifier, 'C4ED3C39-9626-44B6-93D9-000B37BDA575'),convert(uniqueidentifier,'95C1A774-C79B-4861-AAB4-09E8070188FB') UNION ALL
SELECT 3,convert(uniqueidentifier, 'B4ED3C39-9626-44B6-93D9-000B37BDA575'),convert(uniqueidentifier,'5A116CF5-BD94-46F7-9533-4576D83CC6B0') UNION ALL
SELECT 4,convert(uniqueidentifier, '95C1A774-C79B-4861-AAB4-09E8070188FB'),convert(uniqueidentifier,'E43FB52B-863D-4D28-95B5-2871CC823721') UNION ALL
SELECT 5,convert(uniqueidentifier, 'E43FB52B-863D-4D28-95B5-2871CC823721'),convert(uniqueidentifier,'3C571CC5-FE52-4453-AFDC-313EBF9CF05F') UNION ALL
SELECT 6,convert(uniqueidentifier, 'E43FB52B-863D-4D28-95B5-2871CC823721'),convert(uniqueidentifier,'68B7C10A-D03E-4F99-9737-6D4CED3B4692') UNION ALL
SELECT 7,convert(uniqueidentifier, 'E43FB52B-863D-4D28-95B5-2871CC823721'),convert(uniqueidentifier,'DD74A988-99F8-4BF7-B884-7DDE14264D0C') UNION ALL
SELECT 8,convert(uniqueidentifier, 'D77FB52B-863D-4D28-95B5-2871CC823721'),convert(uniqueidentifier,'DD74A988-99F8-4BF7-B884-7DDE14264D0C')
--get nodes recursive, this is where slowness be
;WITH TempCTE AS
(
SELECT ParentID, ObjectID, 0 as Depth
FROM #TempCTE
WHERE ParentID='B4ED3C39-9626-44B6-93D9-000B37BDA575'
UNION ALL
SELECT T1.ParentID, T1.ObjectID, Depth + 1
FROM #TempCTE T1
INNER JOIN TempCTE T2 on T2.ObjectID = T1.ParentID
)
SELECT TOP 100 PERCENT *
FROM TempCTE
ORDER BY ParentID ;
DROP TABLE #TempCTE
October 10, 2012 at 3:20 pm
You should post the actual DDL.
From what I see you could make this faster using the Local Updateable Variable approach (e.g. the 'Quirky Update')... See http://www.sqlservercentral.com/articles/T-SQL/68467/
-- Itzik Ben-Gan 2001
October 10, 2012 at 3:21 pm
A co-worker analyzed CTE performance in SQL Server 2005 and determined it begins to degrade after, if I recall, 100,000 records.
As we encounter performance related issues in production which involve CTE's we rip them out and refactor the proc.
Maybe the performance is better in 2008 R2 or 2012 ...
October 10, 2012 at 3:33 pm
XMLSQLNinja (10/10/2012)
You should post the actual DDL.From what I see you could make this faster using the Local Updateable Variable approach (e.g. the 'Quirky Update')... See http://www.sqlservercentral.com/articles/T-SQL/68467/
Alen,
I updated my original post with complete runable DDL.
October 10, 2012 at 3:56 pm
Homersdonut (10/10/2012)
A co-worker analyzed CTE performance in SQL Server 2005 and determined it begins to degrade after, if I recall, 100,000 records.As we encounter performance related issues in production which involve CTE's we rip them out and refactor the proc.
Maybe the performance is better in 2008 R2 or 2012 ...
Based on my experience, a will written CTE does not slow down over time - they essentially produce the same Execution plan as a correlated subquery but, IMHO, are easier to write as the code is cleaner.
If your co-worker was talking about Recursive CTEs (rCTEs), I will aver that get a bad rap often without justification. This is one of those "It depends" kinda things. I have tested the use of rCTEs for running totals - if the DDL is good an rCTE is profoundly faster than Cursors or any other iterative method for this type of task. Nonetheless, the 'Quirky Update' is the way to go.
-- Itzik Ben-Gan 2001
October 10, 2012 at 3:57 pm
haiao2000 (10/10/2012)
XMLSQLNinja (10/10/2012)
You should post the actual DDL.From what I see you could make this faster using the Local Updateable Variable approach (e.g. the 'Quirky Update')... See http://www.sqlservercentral.com/articles/T-SQL/68467/
Alen,
I updated my original post with complete runable DDL.
I'll look at this again later and have a reply later this evening. 😉
-- Itzik Ben-Gan 2001
October 10, 2012 at 4:01 pm
XMLSQLNinja (10/10/2012)
haiao2000 (10/10/2012)
XMLSQLNinja (10/10/2012)
You should post the actual DDL.From what I see you could make this faster using the Local Updateable Variable approach (e.g. the 'Quirky Update')... See http://www.sqlservercentral.com/articles/T-SQL/68467/
Alen,
I updated my original post with complete runable DDL.
I'll look at this again later and have a reply later this evening. 😉
Allen,
I am still reading the Quirky Update post and hope to understand it. Could you see if you can show me how to do this recursive in a Quirky Update way?
Thanks in advance.
October 10, 2012 at 4:32 pm
Hi there, I would like to offer you my own twist on this using the "identity hack".
The outline of the logic is to replace the recusive cte with a recursive INSERT managed by a WHILE statement and the "identity hack".
First you are going to need somewhere to store the results
IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL
DROP TABLE #RESULTS;
CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL);
CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID);
The results table matches your source data, with an extra identity column which has an index with an included column of childid.
Next you need to allow identity inserts on the results table...
SET IDENTITY_INSERT #RESULTS ON;
And pop the head of the list into the results with a "Level" of 1.
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT ID,PARENTID,CHILDID,1
FROM #SOURCE
WHERE PARENTID=@START;
Now for the "recursive" bit
WHILE @@ROWCOUNT>0
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1
FROM #RESULTS AS R
JOIN #SOURCE AS S
ON S.PARENTID = R.CHILDID
WHERE LEVEL = SCOPE_IDENTITY();
Explanation of this code:
WHILE @@ROWCOUNT>0 just repeats the INSERT until it processes zero rows (you have run out of CHILDREN)
The INSERT, uses SCOPE_IDENTITY two ways.
1. to SELECT all the ROWS that were last added to the results table
2. to INSERT the new Level value (last value + 1)
Finally, turn off identity insert and select your results.
SET IDENTITY_INSERT #RESULTS OFF;
SELECT *
FROM #RESULTS;
Using this approach I can "walk the tree" for a four level (44000 results) setup in a 4.5 Million row source table in seconds, whereas the recursive CTE takes too long to bother timing it.
Try it out, see if it helps.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);October 10, 2012 at 7:14 pm
mister.magoo (10/10/2012)
Hi there, I would like to offer you my own twist on this using the "identity hack".The outline of the logic is to replace the recusive cte with a recursive INSERT managed by a WHILE statement and the "identity hack".
First you are going to need somewhere to store the results
IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL
DROP TABLE #RESULTS;
CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL);
CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID);
The results table matches your source data, with an extra identity column which has an index with an included column of childid.
Next you need to allow identity inserts on the results table...
SET IDENTITY_INSERT #RESULTS ON;
And pop the head of the list into the results with a "Level" of 1.
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT ID,PARENTID,CHILDID,1
FROM #SOURCE
WHERE PARENTID=@START;
Now for the "recursive" bit
WHILE @@ROWCOUNT>0
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1
FROM #RESULTS AS R
JOIN #SOURCE AS S
ON S.PARENTID = R.CHILDID
WHERE LEVEL = SCOPE_IDENTITY();
Explanation of this code:
WHILE @@ROWCOUNT>0 just repeats the INSERT until it processes zero rows (you have run out of CHILDREN)
The INSERT, uses SCOPE_IDENTITY two ways.
1. to SELECT all the ROWS that were last added to the results table
2. to INSERT the new Level value (last value + 1)
Finally, turn off identity insert and select your results.
SET IDENTITY_INSERT #RESULTS OFF;
SELECT *
FROM #RESULTS;
Using this approach I can "walk the tree" for a four level (44000 results) setup in a 4.5 Million row source table in seconds, whereas the recursive CTE takes too long to bother timing it.
Try it out, see if it helps.
Holycow! Mister.magoo, you will never know how much value your post is to me, it may mean nothing to someone but to me it is a huge thing. My CTE recursive runs in 18-20 seconds which is not acceptable, this sh't runs less than a second that is too wicked fast.
I really appreciate that you spend time write up this great post. I am still trying to understand how your code works myself :-).
The part I don't understand is why LEVEL INT IDENTITY(1,1) is set to auto increment, then turn identity insert off when inserting?
another one is you said "SELECT all the ROWS that were last added" does that mean if after looping 5 times, the 6th iteration will select last 5 rows from #RESULT?
BTW, why do you call this method is a Identity Hack? Where do you learn this type of woodo? 🙂
Updated:
If I run these statements before I run the query, it still takes 15 seconds each...so I guess it is not as fast as I thought, or did I do something wrong?
DBCC DROPCLEANBUFFERS --clears all data from the cache.
DBCC FREEPROCCACHE --clears the stored procedure cache.
October 11, 2012 at 12:44 am
haiao2000 (10/10/2012)
I really appreciate that you spend time write up this great post. I am still trying to understand how your code works myself :-).
The part I don't understand is why LEVEL INT IDENTITY(1,1) is set to auto increment, then turn identity insert off when inserting?
I need an identity column just so I can use SCOPE_IDENTITY(), not because I want it to auto-increment, so I declare it as an identity, then "turn it off"...
another one is you said "SELECT all the ROWS that were last added" does that mean if after looping 5 times, the 6th iteration will select last 5 rows from #RESULT?
Not the last 5 rows, but it does mean it will select the rows that were added in iteration #5 because they are all tagged with Level 5
BTW, why do you call this method is a Identity Hack? Where do you learn this type of woodo? 🙂
I call it "Identity Hack" because I am using an identity column as a way of passing a value from one INSERT statement to the next without use of variables, which is not it's intended purpose - hence Identity Hack.
Where woodoo taught? Look around you - Woodoo - she everywhere ! 😀
Updated:
If I run these statements before I run the query, it still takes 15 seconds each...so I guess it is not as fast as I thought, or did I do something wrong?
DBCC DROPCLEANBUFFERS --clears all data from the cache.
DBCC FREEPROCCACHE --clears the stored procedure cache.
I would imagine that you have indexing problems if this is taking roughly as long as the R-CTE with clean buffers/cache.
Can you post an actual execution plan (the xml/sqlplan type) along with the output from STATISTICS TIME and IO (to see where the pain is)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);October 11, 2012 at 2:24 am
haiao2000 (10/10/2012)
...whatelse could i do to make this query perform better. this thing runs for about 20seconds on the table that contains more than 10 millions of records, which is not acceptable.
there is a non cluster index on ObjectID
there is a non cluster index on ParentID
parent-to-child is many-to-many relationship.
...SELECT TOP 100 PERCENT *
FROM TempCTE
ORDER BY ParentID;DROP TABLE #TempCTE
[/code]
You only need one index on #TempCTE - a non-unique clustered index on ParentID. Drop the sort on the output unless it's absolutely required - you may get better performance by streaming the results into a third #temp table and clustering it.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
October 11, 2012 at 2:26 am
mister.magoo (10/10/2012)
Hi there, I would like to offer you my own twist on this using the "identity hack".The outline of the logic is to replace the recusive cte with a recursive INSERT managed by a WHILE statement and the "identity hack".
First you are going to need somewhere to store the results
IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL
DROP TABLE #RESULTS;
CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL);
CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID);
The results table matches your source data, with an extra identity column which has an index with an included column of childid.
Next you need to allow identity inserts on the results table...
SET IDENTITY_INSERT #RESULTS ON;
And pop the head of the list into the results with a "Level" of 1.
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT ID,PARENTID,CHILDID,1
FROM #SOURCE
WHERE PARENTID=@START;
Now for the "recursive" bit
WHILE @@ROWCOUNT>0
INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL)
SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1
FROM #RESULTS AS R
JOIN #SOURCE AS S
ON S.PARENTID = R.CHILDID
WHERE LEVEL = SCOPE_IDENTITY();
Explanation of this code:
WHILE @@ROWCOUNT>0 just repeats the INSERT until it processes zero rows (you have run out of CHILDREN)
The INSERT, uses SCOPE_IDENTITY two ways.
1. to SELECT all the ROWS that were last added to the results table
2. to INSERT the new Level value (last value + 1)
Finally, turn off identity insert and select your results.
SET IDENTITY_INSERT #RESULTS OFF;
SELECT *
FROM #RESULTS;
Using this approach I can "walk the tree" for a four level (44000 results) setup in a 4.5 Million row source table in seconds, whereas the recursive CTE takes too long to bother timing it.
Try it out, see if it helps.
Very interesting! Thanks for this, MM!
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
October 11, 2012 at 7:38 am
ChrisM@home (10/11/2012)
haiao2000 (10/10/2012)
...whatelse could i do to make this query perform better. this thing runs for about 20seconds on the table that contains more than 10 millions of records, which is not acceptable.
there is a non cluster index on ObjectID
there is a non cluster index on ParentID
parent-to-child is many-to-many relationship.
...SELECT TOP 100 PERCENT *
FROM TempCTE
ORDER BY ParentID;DROP TABLE #TempCTE
[/code]
You only need one index on #TempCTE - a non-unique clustered index on ParentID. Drop the sort on the output unless it's absolutely required - you may get better performance by streaming the results into a third #temp table and clustering it.
haiao2000: You are correct on the indexing, parentId non clustered index is there, the other index on that table is being used somewhere else. I removed Order By clause as you mentioned.
October 11, 2012 at 7:46 am
Can you make ParentID a non-unique clustered index? It will almost certainly improve performance. Might as well remove the TOP 100 PERCENT too, it's only pretending to do something.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
October 11, 2012 at 7:50 am
mister.magoo (10/11/2012)
haiao2000 (10/10/2012)
I really appreciate that you spend time write up this great post. I am still trying to understand how your code works myself :-).
The part I don't understand is why LEVEL INT IDENTITY(1,1) is set to auto increment, then turn identity insert off when inserting?
I need an identity column just so I can use SCOPE_IDENTITY(), not because I want it to auto-increment, so I declare it as an identity, then "turn it off"...
another one is you said "SELECT all the ROWS that were last added" does that mean if after looping 5 times, the 6th iteration will select last 5 rows from #RESULT?
Not the last 5 rows, but it does mean it will select the rows that were added in iteration #5 because they are all tagged with Level 5
BTW, why do you call this method is a Identity Hack? Where do you learn this type of woodo? 🙂
I call it "Identity Hack" because I am using an identity column as a way of passing a value from one INSERT statement to the next without use of variables, which is not it's intended purpose - hence Identity Hack.
Where woodoo taught? Look around you - Woodoo - she everywhere ! 😀
Updated:
If I run these statements before I run the query, it still takes 15 seconds each...so I guess it is not as fast as I thought, or did I do something wrong?
DBCC DROPCLEANBUFFERS --clears all data from the cache.
DBCC FREEPROCCACHE --clears the stored procedure cache.
I would imagine that you have indexing problems if this is taking roughly as long as the R-CTE with clean buffers/cache.
Can you post an actual execution plan (the xml/sqlplan type) along with the output from STATISTICS TIME and IO (to see where the pain is)
MM
Here is some comparision that I ran this morning. I also attached the execution plan.
Updated: upload 2 images, not sure if those help
Object #Records Returned CTE Identity Hack
AIN 2567 32 14
CIN 2307 34 12
SWCH 560 7 5
REALM 958 10 7
Viewing 15 posts - 1 through 15 (of 43 total)
You must be logged in to reply to this topic. Login to reply