Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Can a strawberry query be done better? Expand / Collapse
Author
Message
Posted Saturday, July 6, 2013 7:47 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, August 18, 2014 7:09 AM
Points: 76, Visits: 438
Thanks. I had the impression that one should use execution plan to get an idea of how well it'll perform for different applications whereas timing the result would be too specific to only one particular data set/schema.

Also, now that it's obvious that strawberry query isn't the best choice, I wonder what else good a triangular join would be for? Jeff Moden had cited them as one of "hidden RBARs" but also mentioned in pass that they can be useful in rare instances. Does anyone know of such instances?


Thanks again!
Post #1470920
Posted Saturday, July 6, 2013 9:39 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:11 AM
Points: 36,994, Visits: 31,513
Banana-823045 (7/6/2013)
Thanks. I had the impression that one should use execution plan to get an idea of how well it'll perform for different applications whereas timing the result would be too specific to only one particular data set/schema.

Also, now that it's obvious that strawberry query isn't the best choice, I wonder what else good a triangular join would be for? Jeff Moden had cited them as one of "hidden RBARs" but also mentioned in pass that they can be useful in rare instances. Does anyone know of such instances?


Thanks again!


Like Gail said, the values in Execution Plans frequently are estimates (unless they say "actual") which means that "% of Batch" for a multi-query comparison run is almost always an estimate. "% of batch" should never be used as the definitive measure to determine which query is "best" for either performance or resource usage. In fact, it's frequently very "wrong" (estimates aren't wrong, just sometimes skewed) and can show exactly the opposite of what is true during a run.

For example, here's a classic case of why many people come to the erroneous conclusion that a recursive CTE that counts is lightning fast compared to many of the other methods which are actually much faster than the recursive method. Run the following with the actual execution plan on and note the "% of Batch" for both queries. It shows the rCTE take 0% of the time and the "Tally" method taking 100% of the time. Yet, if you look at the print outs in the Messages tab, you'll find that the opposite is true.

/****************************************************************************************
Purpose:
This code demonstrates that the estimated and actual execution plans in SQL Server can
be 100% INCORRECT and that the execution plan should only be relied on to provide hints
as to what may be wrong with a query rather than an absolute indication. This code runs
in SQL Server 2005 only.

The code creates 30 years worth of dates starting with 2000-01-01 using two different
methods. The first method uses a recursive CTE and the second method uses a "Tally"
structure. The output of each method is directed to a "throw-away" variable to take
display delays out of the picture.

Please check both the actual and estimated execution plans and compare the % of batch.
****************************************************************************************/
SET NOCOUNT ON
--=======================================================================================
-- Recursive method shown by (Name with-held)
--=======================================================================================
PRINT '========== Recursive method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO,TIME ON;

DECLARE @BitBucket DATETIME; --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================
DECLARE @DateVal DATETIME;
SET @DateVal = '2000-01-01';

WITH rCTE AS
(
SELECT @DateVal AS DateVal
UNION ALL
SELECT DateVal = DATEADD(dd,1,DateVal)
FROM rCTE
WHERE DATEADD(dd,1,DateVal) < DATEADD(yy, 30, @DateVal)
)
SELECT @BitBucket = d.DateVal
FROM rCTE d
OPTION (MAXRECURSION 0)
;
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME,IO OFF;
PRINT REPLICATE('=',90);
GO
--=======================================================================================
-- "Tally" structure method
--=======================================================================================
PRINT '========== Tally table method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO,TIME ON;

DECLARE @BitBucket DATETIME; --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================
DECLARE @StartDate AS DATETIME;
SET @StartDate = '2000-01-01';

SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,30,@StartDate)))
@BitBucket = DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,@StartDate)
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME,IO OFF;
PRINT REPLICATE('=',90);
GO


Here's what I get on my older desktop box...

========== Recursive method ==========

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 2, logical reads 65749, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 485 ms, elapsed time = 569 ms.
==========================================================================================
========== Tally table method ==========

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'syscolrdb'. Scan count 1, logical reads 98, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 26 ms.
==========================================================================================




I don't have an example for it but there are also times when some of the nodes in the execution plan will show impossibilites like 114,387% for a value.

I use the execution plan all the time to help me troubleshoot poorly performing queries. I never use the costs or "% of Batch" to determine which will actually perform better. I also don't always trust SET STATISTICS. They can outright lie depending on what is being done. Please see the following article where SET STATISTICS actually makes Scalar UDF's look much worse than they actually are (they're still much worse than other methods, but not as bad as SET STATISTICS makes them look).
http://www.sqlservercentral.com/articles/T-SQL/91724/


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1470923
Posted Saturday, July 6, 2013 9:52 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 7:53 AM
Points: 42,822, Visits: 35,952
Jeff Moden (7/6/2013)
For example, here's a classic case of why many people come to the erroneous conclusion that a recursive CTE that counts is lightning fast compared to many of the other methods which are actually much faster than the recursive method. Run the following with the actual execution plan on and note the "% of Batch" for both queries. It shows the rCTE take 0% of the time and the "Tally" method taking 100% of the time. Yet, if you look at the print outs in the Messages tab, you'll find that the opposite is true.


In this particular case it's because the optimiser costs the recursive CTE based on either just the anchor member or on the anchor member and one recursive step (can't remember which). Hence the estimates are wildly inaccurate.

Exec plans tell you how the query runs. Execution statistics tell you how the query performs.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1470924
Posted Saturday, July 6, 2013 10:57 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:11 AM
Points: 36,994, Visits: 31,513
GilaMonster (7/6/2013)
Jeff Moden (7/6/2013)
For example, here's a classic case of why many people come to the erroneous conclusion that a recursive CTE that counts is lightning fast compared to many of the other methods which are actually much faster than the recursive method. Run the following with the actual execution plan on and note the "% of Batch" for both queries. It shows the rCTE take 0% of the time and the "Tally" method taking 100% of the time. Yet, if you look at the print outs in the Messages tab, you'll find that the opposite is true.


In this particular case it's because the optimiser costs the recursive CTE based on either just the anchor member or on the anchor member and one recursive step (can't remember which). Hence the estimates are wildly inaccurate.

Exec plans tell you how the query runs. Execution statistics tell you how the query performs.


I couldn't remember which one either and so I didn't bring it up, but you're absolutely correct.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1470928
Posted Thursday, January 9, 2014 3:11 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, January 9, 2014 3:07 AM
Points: 1, Visits: 0
If you look in the MySQL manual, you'll see that your 'strawberry' query is offered as one method for solving this kind of problem. With the advent of subqueries however, the other methods provided on the same page are inherently faster (on appropriately indexed tables). In tests, the 'uncorrelated subquery' method usually wins out.

Strawberry
Post #1529264
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse