RE: Can a strawberry query be done better?

SSC Guru

Points: 1003863

July 6, 2013 at 9:39 am

Banana-823045 (7/6/2013)
Thanks. I had the impression that one should use execution plan to get an idea of how well it'll perform for different applications whereas timing the result would be too specific to only one particular data set/schema.
Also, now that it's obvious that strawberry query isn't the best choice, I wonder what else good a triangular join would be for? Jeff Moden had cited them as one of "hidden RBARs" but also mentioned in pass that they can be useful in rare instances. Does anyone know of such instances?
Thanks again!

Like Gail said, the values in Execution Plans frequently are estimates (unless they say "actual") which means that "% of Batch" for a multi-query comparison run is almost always an estimate. "% of batch" should never be used as the definitive measure to determine which query is "best" for either performance or resource usage. In fact, it's frequently very "wrong" (estimates aren't wrong, just sometimes skewed) and can show exactly the opposite of what is true during a run.

For example, here's a classic case of why many people come to the erroneous conclusion that a recursive CTE that counts is lightning fast compared to many of the other methods which are actually much faster than the recursive method. Run the following with the actual execution plan on and note the "% of Batch" for both queries. It shows the rCTE take 0% of the time and the "Tally" method taking 100% of the time. Yet, if you look at the print outs in the Messages tab, you'll find that the opposite is true.

/****************************************************************************************

Purpose:

This code demonstrates that the estimated and actual execution plans in SQL Server can

be 100% INCORRECT and that the execution plan should only be relied on to provide hints

as to what may be wrong with a query rather than an absolute indication. This code runs

in SQL Server 2005 only.

The code creates 30 years worth of dates starting with 2000-01-01 using two different

methods. The first method uses a recursive CTE and the second method uses a "Tally"

structure. The output of each method is directed to a "throw-away" variable to take

display delays out of the picture.

Please check both the actual and estimated execution plans and compare the % of batch.

****************************************************************************************/

SET NOCOUNT ON

--=======================================================================================

-- Recursive method shown by (Name with-held)

--=======================================================================================

PRINT '========== Recursive method =========='

--===== Turn on some performance counters ===============================================

SET STATISTICS IO,TIME ON;

DECLARE @Bitbucket DATETIME; --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================

DECLARE @DateVal DATETIME;

SET @DateVal = '2000-01-01';

WITH rCTE AS

(

SELECT @DateVal AS DateVal

UNION ALL

SELECT DateVal = DATEADD(dd,1,DateVal)

FROM rCTE

WHERE DATEADD(dd,1,DateVal) < DATEADD(yy, 30, @DateVal)

)

SELECT @Bitbucket = d.DateVal

FROM rCTE d

OPTION (MAXRECURSION 0)

;

--===== Turn off the performance counters and print a separator =========================

SET STATISTICS TIME,IO OFF;

PRINT REPLICATE('=',90);

GO

--=======================================================================================

-- "Tally" structure method

--=======================================================================================

PRINT '========== Tally table method =========='

--===== Turn on some performance counters ===============================================

SET STATISTICS IO,TIME ON;

DECLARE @Bitbucket DATETIME; --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================

DECLARE @StartDate AS DATETIME;

SET @StartDate = '2000-01-01';

SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,30,@StartDate)))

@Bitbucket = DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,@StartDate)

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

--===== Turn off the performance counters and print a separator =========================

SET STATISTICS TIME,IO OFF;

PRINT REPLICATE('=',90);

GO

Here's what I get on my older desktop box...

========== Recursive method ==========

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

Table 'Worktable'. Scan count 2, logical reads 65749, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 485 ms, elapsed time = 569 ms.

==========================================================================================

========== Tally table method ==========

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

Table 'syscolrdb'. Scan count 1, logical reads 98, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'syscolpars'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 15 ms, elapsed time = 26 ms.

==========================================================================================

I don't have an example for it but there are also times when some of the nodes in the execution plan will show impossibilites like 114,387% for a value.

I use the execution plan all the time to help me troubleshoot poorly performing queries. I never use the costs or "% of Batch" to determine which will actually perform better. I also don't always trust SET STATISTICS. They can outright lie depending on what is being done. Please see the following article where SET STATISTICS actually makes Scalar UDF's look much worse than they actually are (they're still much worse than other methods, but not as bad as SET STATISTICS makes them look).

http://www.sqlservercentral.com/articles/T-SQL/91724/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)