CPU time drops, but elapsed time increases

Question

CPU time drops, but elapsed time increases

Sander A.

Default port

Points: 1440
More actions
February 20, 2010 at 2:38 am

#230196

I'm trying to tune a big query.
The starting point is a query with a query cost of 4.2, CPU time 4 seconds, execution time 6 seconds. After tuning this query the query cost dropped to 2.4, CPU time 3 seconds. BUT the execution time increased to 20 seconds!!!!
How is this possible, the only main difference i can see in the IO statistics is the increase in worktable reads. The only thing i can come up with is that my improved query uses temp DB much more, which is probably on a slow disk. Is this right?? What other reasons are there for the big difference between CPU time and execution time.
(ps: I am not allowed to give you any queries, so it's just a discussion about the behavior of SQL server)

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 1

Without an execution plan, this is just idle speculation, but it does sound like you've already hit the nail on the head. Increasing I/O and decreasing CPU is a good way to decrease performance. It's better to be bound on the CPU than on I/O because you can always throw more and bigger CPU's at a problem, while I/O issues are pernicious.

It sounds like your tuned version of the query isn't.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Steve Jones - SSC Editor SSC Guru Points: 741450 More actions · Answer 2

I think it's a tough one and a toss up. You might not be able to get more/better CPUs easier than disk in an enterprise box. It could easily be maxed on sockets and faster/newer CPUs could require a MB, which is a new server in the Enterprise.

On the other hand, you could add more LUNs/drives and play some IO games. Even adding larger cache to a controller can be easier. You could move indexes to a new file on other disks, or try some other placement games. Be sure you also check partition alignment.

Grant is right in that you've identified the issue, and moved to an I/O bound solution. You either need to move back, or fix IOs.

RBarryYoung SSC Guru Points: 143329 More actions · Answer 3

Sander A. (2/20/2010)
I'm trying to tune a big query.
The starting point is a query with a query cost of 4.2, CPU time 4 seconds, execution time 6 seconds. After tuning this query the query cost dropped to 2.4, CPU time 3 seconds. BUT the execution time increased to 20 seconds!!!!
How is this possible, the only main difference i can see in the IO statistics is the increase in worktable reads. The only thing i can come up with is that my improved query uses temp DB much more, which is probably on a slow disk. Is this right?? What other reasons are there for the big difference between CPU time and execution time.

Yes, if CPU time drops, but Elapsed time increases, this indicates that you (and your CPU) are waiting for something else. For SQL Server, this "something else" is almost always going to be Disk waits. In essence, you lowered your CPU by increasing your Disk IOs, which was a poor choice, performance-wise.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150468 More actions · Answer 4

None of the replies so far explain the increase in worktable reads.

Personally, I doubt the query is disk-bound; it is more likely that the rewrite introduced a spool of some kind, or is using a kind of join which requires a worktable, and the plan causes the spool to be accessed a large number of times. A so-called 'perf spool' associated with an 'optimized' loop join is an example of this kind of behaviour.

I appreciate that it is not possible to post sample data or the query, but a graphical plan obtained from an actual execution would be extremely helpful - even if it is just an image.

The behaviour of SQL Server's optimizer is extremely complex, so it is not really possible to state with any authority what the cause is without seeing something of the plan.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Sander A. Default port Points: 1440 More actions · Answer 5

Oke, i will post the queries and the statistic output, maybe you can tell me something more about it. As you can see the query cost of the first query is 334.6, and the second query is 2.56. But the execution time of the first query is slightly less (ps. these are some other numbers than i told you about in the first post, this is because i couldn't reproduce that query since i changed some indexes. But the behavior is the same with this query)

The queries:

--Query 1, high query cost, but the fastest of the 2

;WITH

statCTE (IDKeyword, impr, clicks, pos, costs) AS (

SELECT

stat.IDKeyword

, ISNULL(sum(cast(stat.Impressions as bigint)),0) impr

, ISNULL(sum(cast(stat.Clicks as bigint)),0) clicks

, ISNULL(SUM(ISNULL(stat.AveragePosition,0)*ISNULL(cast(stat.Impressions as bigint),0)) / NULLIF(SUM(cast(stat.Impressions as bigint)), 0), 0) pos

, ISNULL(sum(stat.TotalCostsEuro),0.0) costs

FROM SAS_KeywordStatistics stat WITH (NOLOCK)

WHERE IDAccount = 1937

AND Date BETWEEN '01/01/2009' AND '12/31/2009'

GROUP BY IDKeyword

)

SELECT Dense_Rank() OVER ( ORDER BY kwd.Term desc, kwd.id) RowNr

,kwd.id id

, act.SyncToAdMedium AS isSamActive

, act.IDAccountType

, kwd.HasHistory

, act.UserDescription actName

, act.ID IDAccount

, cpn.Name cpnName

, cpn.ID IDCampaign

, grp.Name grpName

, grp.ID IDGroup

, kwd.Term kwdName

, StatCTE.impr impr

, StatCTE.clicks clicks

, StatCTE.pos pos

, CAST(StatCTE.clicks AS Decimal(19,2))/NULLIF(CAST(StatCTE.impr AS bigint), 0)*100 ctr

, StatCTE.costs/NULLIF(StatCTE.clicks, 0) cpc

, ISNULL(NULLIF(kwd.MaxCPCEuro, 0), -grp.MaxCPCEuro) maxcpc

, StatCTE.costs costs

, kwd.MinCPCEuro mincpc

, kwd.Negative kwdnegative

FROM SAM_Keywords kwd WITH (NOLOCK)

JOIN SAM_Groups grp WITH (NOLOCK) on kwd.idgroup = grp.id

JOIN SAM_Campaigns cpn WITH (NOLOCK) on kwd.IDCampaign = cpn.ID

JOIN SAM_Accounts act WITH (NOLOCK) ON kwd.IDAccount = act.ID

JOIN CIS_Projects prj WITH (NOLOCK) ON kwd.IDProject = prj.ID

LEFT JOIN StatCTE ON StatCTE.IDKeyword = kwd.ID

WHERE kwd.IDAccount = 1937

AND (kwd.idStatus in (56,176, 365)

AND grp.idStatus in (50,51)

AND cpn.idStatus in (27, 29,30, 31, 48, 79)

AND prj.idStatus in (6, 88,5, 7, 8))

/*

(133396 row(s) affected)

Table 'SAS_KeywordStatistics'. Scan count 5, logical reads 18088, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SAM_Accounts'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SAM_Campaigns'. Scan count 20, logical reads 612, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'CIS_Projects'. Scan count 5, logical reads 154, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SAM_Keywords'. Scan count 10, logical reads 6090, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SAM_Groups'. Scan count 5, logical reads 91753, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 8424 ms, elapsed time = 5953 ms.

*/

-------------------------------------------------------------------------------------------

--Query 2, better execution plan, better use of the indexes, but slower.....

;WITH

statCTE (IDKeyword, impr, clicks, pos, costs) AS (

SELECT

stat.IDKeyword

, ISNULL(sum(cast(stat.Impressions as bigint)),0) impr

, ISNULL(sum(cast(stat.Clicks as bigint)),0) clicks

, ISNULL(SUM(ISNULL(stat.AveragePosition,0)*ISNULL(cast(stat.Impressions as bigint),0)) / NULLIF(SUM(cast(stat.Impressions as bigint)), 0), 0) pos

, ISNULL(sum(stat.TotalCostsEuro),0.0) costs

FROM SAS_KeywordStatistics stat WITH (NOLOCK)

WHERE IDAccount = 1937

AND Date BETWEEN '01/01/2009' AND '12/31/2009'

GROUP BY IDKeyword

)

SELECT Dense_Rank() OVER ( ORDER BY kwd.Term desc, kwd.id) RowNr

,kwd.id id

, act.SyncToAdMedium AS isSamActive

, act.IDAccountType

, kwd.HasHistory

, act.UserDescription actName

, act.ID IDAccount

, cpn.Name cpnName

, cpn.ID IDCampaign

, grp.Name grpName

, grp.ID IDGroup

, kwd.Term kwdName

, StatCTE.impr impr

, StatCTE.clicks clicks

, StatCTE.pos pos

, CAST(StatCTE.clicks AS Decimal(19,2))/NULLIF(CAST(StatCTE.impr AS bigint), 0)*100 ctr

, StatCTE.costs/NULLIF(StatCTE.clicks, 0) cpc

, ISNULL(NULLIF(kwd.MaxCPCEuro, 0), -grp.MaxCPCEuro) maxcpc

, StatCTE.costs costs

, kwd.MinCPCEuro mincpc

, kwd.Negative kwdnegative

FROM SAM_Keywords kwd WITH (NOLOCK)

JOIN SAM_Groups grp WITH (NOLOCK) on kwd.idgroup = grp.id

JOIN SAM_Campaigns cpn WITH (NOLOCK) on kwd.IDCampaign = cpn.ID

JOIN SAM_Accounts act WITH (NOLOCK) ON kwd.IDAccount = act.ID

JOIN CIS_Projects prj WITH (NOLOCK) ON kwd.IDProject = prj.ID

LEFT JOIN StatCTE ON StatCTE.IDKeyword = kwd.ID

WHERE kwd.IDAccount = 1937

AND grp.IDAccount = 1937

AND cpn.IDAccount = 1937

AND act.ID = 1937

AND (kwd.idStatus in (56,176, 365)

AND grp.idStatus in (50,51)

AND cpn.idStatus in (27, 29,30, 31, 48, 79)

AND prj.idStatus in (6, 88,5, 7, 8))

/*

(133396 row(s) affected)

Table 'SAS_KeywordStatistics'. Scan count 133396, logical reads 580986, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'CIS_Projects'. Scan count 0, logical reads 275138, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SAM_Keywords'. Scan count 3, logical reads 5399, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SAM_Groups'. Scan count 2, logical reads 74, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SAM_Campaigns'. Scan count 6, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SAM_Accounts'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 5765 ms, elapsed time = 8548 ms.

*/

Paul White SSC Guru Points: 150468 More actions · Answer 6

I would need to sit down with you for several hours to give that query and plan the discussion it deserves 😉

Nevertheless:

1. In your original post, you said the main difference was in the worktable statistics. There is zero worktable activity for either query!

2. The 'costs' given are estimated costs. These can be highly inaccurate, especially for more complex queries (as this is).

3. SQL Server never provides an 'actual execution cost'. The cost you see was the cost estimate used by the query optimizer. If the optimizer has poor information (for whatever reason), it may produce a poor plan, with an unrealistic cost.

4. Query 1 uses a parallel plan, query 2 runs a serial plan. This is a very significant difference.

5. CIS_Projects has 154 reads in query 1, over 275,000 in query 2 - another huge difference

6. SAS_KeywordStatistics has 18,088 reads in query 1, over 133,000 in query 2 - another huge difference

7. Query 2 uses a number of loop joins, query 1 uses hash joins exclusively*

I could go on, but essentially the two plans are entirely different. Query 2 is a bad plan, probably caused by bad cardinality estimation for the joins involving CIS_Projects, and in particular, SAS_KeywordStatistics. There are many possible reasons for this sort of poor plan, but you can help the optimizer by ensuring it has access to good, up-to-date statistics, and appropriate indexes.

This type of query is probably best run in parallel, using hash joins. As I hinted before, this query has a lot of scope for improvement, but it would have be an on-site thing, with a professional query tuner.

It would be interesting to run the second query forcing hash or merge joins only using an OPTION (HASH JOIN, MERGE JOIN) query hint. That may well produce a plan very similar to query 1.

By the way, I have definitely seen this query, or one very like it, posted here before. My honest recommendation is that you seek the services of a good consultant at some stage to help you.

Paul

* the one loop join used is a special case: it is used only to implement broadcast partitioning for parallelism

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Sander A. Default port Points: 1440 More actions · Answer 7

[quote-0My honest recommendation is that you seek the services of a good consultant at some stage to help you.

[/quote-0]

There is a big change we will do that in the near future. My boss wants every query to run within several seconds...... and I have no idea how to pull this off 😛 😛

Thanks for your advice, i will play with this query some more tomorrow.

PS. your point 1 of the worktable: You are right, as i said before i couldn't reproduce the query that i mentioned in the opening post. So i used this query which has somewhat the same behaviour (except the problem in this case isn't the worktable). Maybe i can reproduce that query tomorow.

Paul White SSC Guru Points: 150468 More actions · Answer 8

Sander A. (2/21/2010)
There is a big change we will do that in the near future. My boss wants every query to run within several seconds...... and I have no idea how to pull this off

Possible architectural solutions include : (1) extracting the data to a data warehouse and building an SSAS cube; and (2) using indexed views to pre-aggregate for common requirements. Both are advanced topics.

Sander A. (2/21/2010)
Your point 1 of the worktable: You are right, as i said before i couldn't reproduce the query that i mentioned in the opening post. So i used this query which has somewhat the same behaviour (except the problem in this case isn't the worktable). Maybe i can reproduce that query tomorow.

Ok, sorry I missed that - but I now feel quite confident it involved loop joins and spools as I speculated before.

Don't worry about reproducing the exact query.

I do think that query 1 is probably reasonably close to the fastest you can get it to go, simply by tweaking the query.

Further improvements are likely to need a deeper review, with expert help on-site.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

RBarryYoung SSC Guru Points: 143329 More actions · Answer 9

Paul White (2/21/2010)
...
Personally, I doubt the query is disk-bound; it is more likely that the rewrite introduced a spool of some kind, or is using a kind of join which requires a worktable, and the plan causes the spool to be accessed a large number of times. A so-called 'perf spool' associated with an 'optimized' loop join is an example of this kind of behaviour...

This explanation is not exclusive of being disk-bound Paul, in fact they are likely both true. And I would give a 90%+ likelihood to it being disk-bound based on the numbers given when I first replied (hmm, for some reason I never got notice of the later replies...)

What I can say with 100% certainty is that:

TrueElapsedTime(*) - (CPUtime / Parallelism) = OtherWaitTime

*(SQL Server incorrectly records/reports "elapsed time" under parallelism)

And, for the above (assuming insignificant overlapping(##)):OtherWaitTime =

DiskWaitTime

+ NetworkWaitTime

+ RemoteServiceWaitTime

+ PresentationWaitTime(*)

+ UserWaitTime(*)

+ StupidWaitTime(**)

*(should not happen for this case)

**(includes things like code that waits for a specific TimeOfDay: i.e., Stupid)

And if you go through the list you can see that the only one that is likely is DiskWaitTime, which would make it IO Bound. This is all based on 40 year-old Systems Perfomance Analysis theory which says (among other things) that if you are waiting (i.e., elapsed time), then you must be waiting for something (i.e., component physical wait times).

##(RE: Overlapping: two things 1) If it is happening, the formula correction is to divde by an Overlap Factor (similar to CPU parallelism), and 2) Overlapping makes OtherWaitTime (and thus, DiskWaitTime in all probablity) a more significant problem, not less.)

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150468 More actions · Answer 10

Barry,

I'm not sure about that equation! Where do I place waits due to a lack of an available scheduler? Where do waits on memory access appear?

For my money, both queries were probably CPU-bound.

The serial query certainly performed a lot of I/O, but I sincerely doubt any of it was disk access since worktables (for example a spool) typically remain in memory (buffer pool). Performing a lot of memory I/O is generally enough to max out a single processing unit, so I don't think one could fairly say that such access represented being I/O bound.

In any case, debating whether the bottleneck was CPU or memory is really a bit beside the point. The determining factor here is the quality of the plan. Not surprisingly, a good plan often out-performs a bad plan. The second, serial, plan was a poor choice by the optimizer - for whatever reason.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 11

RBarryYoung (2/21/2010)
This explanation is not exclusive of being disk-bound Paul, in fact they are likely both true. And I would give a 90%+ likelihood to it being disk-bound based on the numbers given when I first replied

This statement bugs me enough to warrant a separate reply.

You'll see that I said "I doubt the query is disk-bound" - which hardly excludes the possibility of it being so. I just doubt it. A lot.

I don't agree that the presence of a performance spool in a plan implies high disk activity at all; in fact it usually means high levels of memory access as the spooled pages are repeatedly referenced from the buffer pool. Unless you were confusing disk activity with data page access from the buffer pool, I don't see how we can reach agreement here.

The whole point of a spool is to minimize physical I/O after all, at the cost of an increase in memory usage, and CPU time.

The only reference to IO in the original post was "the only main difference i can see in the IO statistics is the increase in worktable reads". How that translates to a 90%+ likelihood of being disk-bound baffles me.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

RBarryYoung SSC Guru Points: 143329 More actions · Answer 12

Paul White (2/21/2010)
I'm not sure about that equation!

It's right. Trust me, I've been doing this for a looong time across many different environments and OSs, 20 years at least. The equation traditionally used for breakdown of physical component waits is shorter (CPU, Disk, Network), but I threw in the periphery stuff just for completeness.

Where do I place waits due to a lack of an available scheduler?

Depends on the context and why you're waiting for a scheduler. Usually that would imply multi-process contention for resources, which was not what we were talking about in this case, however, yes it is covered by the equation, it's just not as useful a perspective in this case. This kind of thing is called a "Software Server" or a "Local(In Box) Server" and technically it is a Virtual or software component resource and not a physical one. That means that its contribution to the total Elapsed time is contention-only, not consumption.

This is clearer when we look at the Wait Time equation:

WaitTime(resource) = TimeInQueue(resource) + ServiceTime(resource)

This means that a processes total Wait Time for a resource is equal to the total amount of time spent in that resource's Queue (which is contention: time waiting to use a resource, because someone else is using it and in your way), plus Service Time (which is consumption, the time that the physical resource was working for you). Thus each of the different WaitTimes components in my equation includes both a service-time subcomponent and a queue-time subcomponent. Thus "waits due to a lack of an available scheduler" are added into the queue-times of the physical resources.

Now admittedly, if Software Server waiting(queueing) becomes significant, it can be a data collection & integration headache to get it mathematically correct, so sometimes it is modeled as a separate pseudo-physical resource with a somewhat simpler correction factor (still hard).

Bringing it back to the Schedulers, what this all means, in terms of physical wait-time decomposition is that when you are waiting for a scheduler, you are physically waiting (in-queue), for whatever physical resource the scheduler is waiting for (whether in-queue or in-service).

So, to make a long answer short (too late?), "software servers" like waiting for a scheduler aren't visible in the equation because its an Apples & Oranges situation, they are a different type of resource and do not show up as a discrete entity in physical wait time decomposition, but are spread out over the physical resources.

Where do waits on memory access appear?

Thats the question most people ask and its a bit simpler to explain. Yes, memory is a physical resource, but performance-wise its a different type of physical resource. CPU, disk-throughput, etc. are all what we call rate-based or temporal consumption resources, their primary dimension is time, and consuming them takes time. Memory on the other hand is an allocation or "space" based resource, you don't "consume" memory over time, you "occupy" it over time.

The upshot here is that first, using memory doesn't take time, but waiting for memory (both physical and virtual) does take time, but only because that creates page faults that requires other resources to resolve. Specifically, soft faults require CPU, and hard faults requires Disk IO. So "Memory Waits" are really a combination of waiting for CPU and waiting for Disk IO.

For my money, both queries were probably CPU-bound.

The longer one could not have been, unless the system itself was CPU-saturated by other processes. However, if the environmental/system conditions were the same or similar for both queries, then the longer one spent most of its time waiting for something other than CPU. Those missing 17 seconds have to be accounted for somehow, and it shouldn't be CPU because we already know that (3 seconds). So unless SQL Server is reporting completely wrong numbers (possible?), there's still 17 non-CPU seconds to account for.

The serial query certainly performed a lot of I/O, but I sincerely doubt any of it was disk access since worktables (for example a spool) typically remain in memory (buffer pool). Performing a lot of memory I/O is generally enough to max out a single processing unit, so I don't think one could fairly say that such access represented being I/O bound.

Hmm, OK, you could have a point here, if SQL Server is not correctly accounting this CPU time back to the process that caused it, but it would be a pretty serious performance reporting flaw. So serious that we would have to regard the SQL Server performance statistics as almost completely bogus, unless this was a very rare occurrence.

In any case, debating whether the bottleneck was CPU or memory is really a bit beside the point. The determining factor here is the quality of the plan. Not surprisingly, a good plan often out-performs a bad plan. The second, serial, plan was a poor choice by the optimizer - for whatever reason.

Agreed and agreed. 🙂

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 13

Paul White (2/21/2010)
RBarryYoung (2/21/2010)
Paul White (2/21/2010)
...
Personally, I doubt the query is disk-bound; it is more likely that the rewrite introduced a spool of some kind, or is using a kind of join which requires a worktable, and the plan causes the spool to be accessed a large number of times. A so-called 'perf spool' associated with an 'optimized' loop join is an example of this kind of behaviour...
This explanation is not exclusive of being disk-bound Paul, in fact they are likely both true. And I would give a 90%+ likelihood to it being disk-bound based on the numbers given when I first replied
This statement bugs me enough to warrant a separate reply.
You'll see that I said "I doubt the query is disk-bound" - which hardly excludes the possibility of it being so. I just doubt it. A lot.

I wasn't saying that, Paul. I was replying to the apparent dichotomy presented in your first sentence. It says, in essence, "I doubt X; it is more likely Y." I said, "But Y does not exclude X, in fact it's likely that both X and Y are true." What am I missing here?

I don't agree that the presence of a performance spool in a plan implies high disk activity at all; in fact it usually means high levels of memory access as the spooled pages are repeatedly referenced from the buffer pool.

Referring back to my quantifications above, I didn't say that "Y implies X", I said "X and Y [in this case]". The "in this case" was implicit, though I thought that clear, my apologies if it was not.

Unless you were confusing disk activity with data page access from the buffer pool, I don't see how we can reach agreement here.

I don't see how I could have been confusing anything with "data page access from the buffer pool" because I was neither looking at it nor talking about it. I was looking at the Elapsed time, CPU time and the possible physical resources, because that's how physical wait-time decomposition works. You partition the Elapsed time (which can be called Response time in this case) between the time-consuming physical resources, which tells you where you are spending your time. It's a rock-solid performance analysis methodology and is as reliable as physics (in fact its the most reliable general-purpose performance analysis technique that I know of, though doing it correctly, past the initial break-down can be tricky).

As for us coming to agreement, I have to go back to my Apples and Oranges analogy in my previous post: I think that we are just talking about different ways to break-down or analyze the same thing. I'm doing physical decomposition, I think that you a doing something more akin to functional decomposition. Trying to mix them together can cause a lot of confusion.

The whole point of a spool is to minimize physical I/O after all, at the cost of an increase in memory usage, and CPU time.

Agreed. Therefore I would be forced to conclude that in this case, either it is not effective or not significant.

The only reference to IO in the original post was "the only main difference i can see in the IO statistics is the increase in worktable reads". How that translates to a 90%+ likelihood of being disk-bound baffles me.

Physical Wait Time Decomposition is a relatively simple yet extremely powerful technique derived from the discipline of Queuing Network Analysis(*) that allows a relative outsider to come to seemingly inscrutable "Sherlock Holmes" type deductions that can baffle even the smartest and most adept internal experts. I am familiar with it because for five years I was a regional team leader for DEC's Performance & Capacity Planning team and I used to with this same effect on a weekly basis. Yet in hundreds of applications to performance problems at customer sites, I can fairly say that I can count on one hand the number of times that I got a wrong answer from it (and those were probably all my fault, caused by me using data that I shouldn't have).

I don't say this to boast, but rather by way of explaining that your reaction was not an uncommon one to me back when I was doing this full time. I have more than once seen world-wide experts in their operating system drop their jaws at apparently "impossible" conclusions that we made in less than 10 minutes, that would then take us two or more days to prove to their satisfaction. But as Sherlock Holmes said "whenever the impossible is eliminated, whatever remains, however, improbable, must be the truth". This technique is just very good at determining that certain simple things from a process or systemic perspective either must or must not be true (it has a lot more trouble with low-level details).

*(Lazowska, et.al.[/url], is the practioner's bible for QNA).

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 14

And now, having said all this and gone all this way, I notice something that could make it all irrelevant. *sigh*

Sander: you said that the slower query had an "execution time" of 20 seconds? How did you measure this? I know that that may seem like a silly question, however because of the amount of parallelism in that query, its actually very important.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]