May 27, 2011 at 8:42 pm
As a bit of a sidebar, there IS a problem with CTE's which are really just "inline views". Sometimes there's a little "positive feedback" formed by CTE's and you end up with what I refer to as an "accidental Cross Join". That happens in Wayne's code and, if I convert mine to a single chain of CTE's, also happens in my code.
An easy solution to such a problem is to "Divide'n'Conquer" by using a Temp Table to hold interim results like I did in my code example.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 27, 2011 at 8:46 pm
WayneS (5/27/2011)
Sean Lange (5/27/2011)
ohhh this is starting to sound like a show down. This is going to be a good one to watch!!! Ruban sit back and watch the masters go at it. This has the markings of some pretty cool learning for the rest of us. 😉:blush: Master? Not me... Jeff maybe.. but thanks for the vote of confidence!
Nope... thank you both for the compliment but not me. I'm not a Master at anything. I just get good and lucky now and again.
Well, as is usually the case, the performance of Jeff's code will usually blow mine out of the park. But, I thoroughly agree with the learning part!
I think that if you tweak the accidental cross join out of it, your's will be just as fast.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2011 at 10:55 am
Jeff Moden (5/27/2011)
WayneS (5/27/2011)
Sean Lange (5/27/2011)
ohhh this is starting to sound like a show down. This is going to be a good one to watch!!! Ruban sit back and watch the masters go at it. This has the markings of some pretty cool learning for the rest of us. 😉:blush: Master? Not me... Jeff maybe.. but thanks for the vote of confidence!
Nope... thank you both for the compliment but not me. I'm not a Master at anything. I just get good and lucky now and again.
Then I think that there is something pretty masterful about someone that gets good and lucky that frequently.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 28, 2011 at 12:07 pm
Okay, I converted my earlier code to run with the sample table that Jeff created (removed one cte to create the RN column; renamed RN to SomeID throughout):
;
WITH cte AS
(
-- Break apart all of the names on the space.
-- The latest version of the Delimited Split Function is at http://www.sqlservercentral.com/articles/Tally+Table/72993/.
SELECT *
FROM tempdb.dbo.SomeTable t
CROSS APPLY dbo.DelimitedSplit8K(t.name_1, ' ') ds
), cte3 AS
(
-- Group the results by the name,
-- Get the lowest item for each name part.
SELECT NAME_1,
SomeID,
Item,
ItemNumber = MIN(ItemNumber)
FROM cte
GROUP BY NAME_1, SomeID, Item
)
-- Put the remaining results back together
SELECT t.NAME_1,
ColList = STUFF((SELECT ' ' + t2.Item
FROM cte3 t2
WHERE t.SomeID = t2.SomeID
ORDER BY t2.SomeID, t2.ItemNumber
FOR XML PATH(''),TYPE).value('.','varchar(max)'),1,1,'')
FROM tempdb.dbo.SomeTable t
ORDER BY t.SomeID;
It produces the following statistics (with "SET STATISTICS IO,TIME ON;":
Table 'SomeTable'. Scan count 1, logical reads 40090, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2106 ms, elapsed time = 2530 ms.
Wayne's test, elapsed time = 2530 ms
Then, I compared Jeff's code. Statistics from his:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 5 ms.
Table 'SomeTable'. Scan count 3, logical reads 433, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 779 ms, elapsed time = 754 ms.
(46131 row(s) affected)
SQL Server parse and compile time:
CPU time = 45 ms, elapsed time = 45 ms.
Table '#MyHead'. Scan count 3, logical reads 173, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 98 ms.
SQL Server Execution Times:
CPU time = 79 ms, elapsed time = 146 ms.
Table '#MyHead'. Scan count 19972, logical reads 40191, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SomeTable'. Scan count 1, logical reads 146, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 967 ms, elapsed time = 1013 ms.
(12311 row(s) affected)
(12311 row(s) affected)
Table '#ForReview'. Scan count 1, logical reads 124, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 359 ms.
Jeff's Test, elapsed time: 2290 ms
Jeff's was 240ms faster for this test data - but had slightly higher reads. Essentially a draw.
My conclusion: On the test data, either solution will suffice. Test both to see which works better for you. The main things to watch out for is that with slightly higher IO, Jeff's might start slowing down once it is scaled up to millions of rows. Mine isn't utilizing "divide and conquer" techniques, which when scaled to millions of rows might cause increased activity.
One other thing to note: the execution plan for Jeff's code shows parallelism going on - if you're on a server with lots of cores, this could make a drastic difference also. Something else to test. (Jeff's 9yr old P4 wouldn't show this.) (My code doesn't have any parallelism going on.)
Please, respond back to let us know how it goes... there's some curious folks around here that want to know! (Don't forget to answer Jeff's earlier questions also - this may make a difference.)
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 28, 2011 at 12:37 pm
Your modification worked a real treat, Wayne... it used to take almost 7 mimutes for your previous code to run against the test data on my box. Now, it only takes 10 seconds. Well done.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2011 at 12:57 pm
Here's the profiler run between my and Wayne's code on my old box. I guess I'm going to have to try it on a more modern machine and see what's up. I'm running 2k5 on my old box and 2k8 on the newer one. I'm not sure that's apples and apples.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2011 at 3:17 pm
Here's the profiler run on my more modern computer between my code and Jeff's code:
The things that I'm curious about:
1. Why aren't the # of reads between runs on your code consistent?
2. Why aren't the # of writes between runs on your code consistent?
About my mod speeding it up... all I did was take out the row_number() assignment in the first cte (and changed RN to SomeID throughout to be compatible with your test table). Well, hang on... I did delete the cross apply, and changed it to do the stuff(for xml ) stuff in the column assignment - I bet that's what the difference is. I guess that is what you're calling the "accidental cross join"?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 28, 2011 at 9:54 pm
I don't know for sure but I'd almost bet the number of reads and writes being different from run to run are because I'm writing to disk and SQL Server decides what the best way to do that is. It won't always choose the same pages/extents.
For the accidental cross-join, I didn't actually troubleshoot what was causing it. I only know I saw it in the Execution Plan.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply