Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Execution plan Expand / Collapse
Author
Message
Posted Monday, November 15, 2010 12:53 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Thursday, November 21, 2013 11:33 PM
Points: 1,481, Visits: 1,959
An article by Jeff Moden (http://www.sqlservercentral.com/articles/T-SQL/71511) got me really confused and instead of hijacking that thread i thought i would start a new one.

I had planed on asking why not using a second group by instead of the odd order by, like
SELECT [Month] = DATENAME(mm,SomeDateTime),
DatePart(mm, SomeDateTime),
Amount = SUM(SomeAmount)
FROM #MyHead
WHERE SomeDateTime >= '2010' AND SomeDateTime < '2011'
GROUP BY DATENAME(mm,SomeDateTime), DatePart(mm, SomeDateTime)
ORDER BY DatePart(mm, SomeDateTime)
;

It looked to me to be as effective or even better since i "lost" an extra "Compute Scalar". However when i removed the where clause (to force a bit bigger result) it showed that Jeffs was faster by something like 20%. That wasnt what i found strange. It was the execution plan.

Just looking at the numbers produced i would have guessed that the second one would be faster. The CPU and I/O are identical (on the matching parts). But the first query has an extra operation. So how the heck can it be faster?

Running on:
Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86) Oct 14 2005 00:33:37 Copyright (c) 1988-2005 Microsoft Corporation Developer Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

PS I boosted the nr of rows in Jeffs temptable to 10 000 000.

/T


  Post Attachments 
Plan1.sqlplan (10 views, 13.37 KB)
Plan2.sqlplan (6 views, 13.37 KB)
Post #1020580
Posted Monday, November 15, 2010 2:31 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:59 AM
Points: 11,168, Visits: 10,931
There are a number of differences, but the three most important are:

1. In the Compute Scalar immediately above the Clustered Index Scan, the plan with the extra GROUP BY calculates:

[Expr1004] = Scalar Operator(datename(month,[tempdb].[dbo].[#MyHead].[SomeDateTime]))
[Expr1005] = Scalar Operator(datepart(month,[tempdb].[dbo].[#MyHead].[SomeDateTime]))


whereas the plan with the single GROUP BY only calculates:

[Expr1004] = Scalar Operator(datename(month,[tempdb].[dbo].[#MyHead].[SomeDateTime]))


2. The extra DATEPART computation may not seem much, but it is repeated 10,000,000 times. The extra expression also increases the total amount of data flowing into the Hash Match (Aggregate) that comes next. Again, the per-row size of the DATEPART expression (integer) is only four bytes, but multiplied by ten million, it adds up. The total row size estimates show an increase from 467MB to 505MB, which is 38MB or 8% extra.

3. The Hash Match aggregate implements the GROUP BY expression(s). The double GROUP BY, as well as having an extra 38MB of input, performs a more complex hash match on [Expr1004],[Expr1005] versus [Expr1004] alone.

Of course it is pretty obvious to a human that DATEPART and DATENAME in this context will group in exactly the same way (they are based on the same date, and use the same MONTH granularity). Each possible value of DATEPART is associated with exactly one value of DATENAME. This one-to-one mapping means that we only need to group on DATEPART or DATENAME, not both.

The query optimizer does not include logic to reason that way, so it performs the literal double grouping. (Another consideration here is that DATENAME is non-deterministic).

I hope that helps.

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1020616
Posted Monday, November 15, 2010 2:40 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, April 14, 2014 7:26 AM
Points: 1,949, Visits: 8,291
I believe the difference here is in the hash match operation , in your version it is matching on 2 columns and in Jeff's 1 column.

Hash matching is a comparatively expensive operation and the pure number of scalar operations executed is a bit of a red herring.

-- Though Paul has done much better explanation




Clear Sky SQL
My Blog
Kent user group
Post #1020619
Posted Monday, November 15, 2010 2:49 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Thursday, November 21, 2013 11:33 PM
Points: 1,481, Visits: 1,959
Thank you both.

I did figure it would be partly because of the much larger dataset. I had just assumed that something in the execution plan regarding cost (CPU/IO/Operator) would differ more to indicate that one would be better then the other. The hashmatch differs between 66.9272 vs 66.9432 which to me feels insignificant. But i guess there is more to decifering an execution plan then meets the eye.

/T
Post #1020623
Posted Monday, November 15, 2010 2:55 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, April 14, 2014 7:26 AM
Points: 1,949, Visits: 8,291
Be careful when comparing costs within the plan , they are ALL estimated costs. The actual costs could differ dramatically.



Clear Sky SQL
My Blog
Kent user group
Post #1020628
Posted Monday, November 15, 2010 3:07 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:59 AM
Points: 11,168, Visits: 10,931
tommyh (11/15/2010)
I did figure it would be partly because of the much larger dataset. I had just assumed that something in the execution plan regarding cost (CPU/IO/Operator) would differ more to indicate that one would be better then the other.

The extra cost is primarily in the extra 10 million DATEPART calculations in the Compute Scalar. I listed the other effects because I thought you were after a comprehensive explanation. The exact reasons for this are complex, and to do with the way Compute Scalars are implemented, and the fact that the query plan contains a scalar hash aggregate - as opposed to a hash join.

The reason you don't see much difference in the costs of the Compute Scalars in the two plans is because the cost model doesn't really do much costing with Compute Scalars. See http://www.sqlskills.com/BLOGS/CONOR/post/SQL-Server-Scalar-Operator-Costing-aka-umm-what-costing-.aspx.

The hashmatch differs between 66.9272 vs 66.9432 which to me feels insignificant. But i guess there is more to decifering an execution plan then meets the eye.

In this case, the estimated costs are not too far off - the extra hash effort is pretty small. By the way, are you able to run this same query on a machine with many cores? The difference parallelism makes would be interesting - that's my bet anyway, but I am often wrong




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1020633
Posted Monday, November 15, 2010 3:11 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:59 AM
Points: 11,168, Visits: 10,931
Dave Ballantyne (11/15/2010)
I believe the difference here is in the hash match operation , in your version it is matching on 2 columns and in Jeff's 1 column. Hash matching is a comparatively expensive operation and the pure number of scalar operations executed is a bit of a red herring.

Sorry Dave I hadn't read this one when I posted my last - apologies if it sounds 'off' - just a timing thing, I promise. Though I do disagree with your assessment




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1020634
Posted Friday, February 08, 2013 6:47 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, September 14, 2013 1:58 AM
Points: 4, Visits: 51
Hi Paul

I am trying to learn the execution plan and aims to become a DBA.

In the above thread i did tried to understand the two execution plans and did get some success but failed to understand how to know which 'compute scalar' computes what expression.

and why the sorting operator has different sequence in both the plans.

Grateful if you could throw some light on the above issues.

Thanks
Deepak !
Post #1417673
Posted Friday, February 08, 2013 6:49 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, September 14, 2013 1:58 AM
Points: 4, Visits: 51
Hi Paul

I am trying to learn the execution plan and aims to become a DBA.

In the above thread i did tried to understand the two execution plans and did get some success but failed to understand how to know which 'compute scalar' computes what expression.

and why the sorting operator has different sequence in both the plans.

Grateful if you could throw some light on the above issues.

Thanks
Deepak !
Post #1417674
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse