How do I know if this query's performance can be improved?

Question

How do I know if this query's performance can be improved?

Jody F Powlette

SSC Rookie

Points: 34
More actions
October 9, 2009 at 3:13 pm

#385551

I have a simple table with 5 columns (4 ints and a float) with 18mil rows and I run this query:
select sum(weight),count(*) as theCount, hit,cell,MetricID,adServerCampaignId
from adServerRespondentReport (NOLOCK)
group by hit,cell,MetricID,adServerCampaignId
It uses the this non-clustered index for an Index Scan:
IX5 (hit, Cell, MetricID, AdServerCampaignID, Weight)
It starts to return results immediately, but it takes 30+ seconds to return all 20,000 rows. (I know it's slow because if I simply select top 20000 * from this table, it returns all 20,000 rows in a second or two).
Am I done? Can I do better than index seek on this query? Do I need to start looking to OLAP or some other solution? I've seen others report tables with 500mil rows return similar queries in a couple seconds. Why is mine so slow? It's running on a pretty beefy box.
Any help is appreciated!

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 1

First, test performance by writing to a temp table, not displaying on a screen. Screen display of thousands of rows adds overhead that has nothing to do with the query performance.

Also, performance of a query changes based on the environment. Are all the rows on pages already residing in memory, or is disk i/o required? Is the query being run for the first time, or has the execution plan been cached.

What's a "beefy" machine? If you are I/O constrained, it doesn't matter how many processors you have, or what the speed of the processor is?

Based on what you are displaying, that nonclustered index looks like a perfect covering index for your query. Unless you are adding where conditions, a summary query like that has to scan the entire table (or covering index.)

But basically, I don't really see anything wrong with the query, or the index being used to satisfy it.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Jody F Powlette SSC Rookie Points: 34 More actions · Answer 2

Verified:

select sum(weight) as w,count(*) as theCount, hit,cell,MetricID,adServerCampaignId

into #tmp

from adServerRespondentReport (NOLOCK)

group by hit,cell,MetricID,adServerCampaignId

(21349 row(s) affected)

also takes close to 30 seconds.

So there's no way to improve the performance of this query? Every time I run it, it takes just as long even though the data is static (updated quarterly). Is there any way to cache results or tell the database to pre-aggregate values like this? The rub is, I need to also apply a where clause to limit by MetricID or cell or campaignID, so I need to be able to slice it N different ways (which runs slightly faster depending on how many rows I select), but it's still slow and I'd like to improve the base-case of selecting all rows.

What about other solutions like OLAP or some other tools that might create lookup or summarize data offline for fast querying online? Anything??

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

Please post execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

There's no way you'll get an index seek with this query. There's no filter condition. A scan is about the best you'll get and the index looks optimal. Would like to see exec plan to be completely sure.

There's a major difference between an aggregated query that returns 20 000 rows and SELECT TOP 20 000. The first has to read the entire index, the second doesn't. Try comparing the speed with this query

select weight, hit,cell,MetricID,adServerCampaignId

from adServerRespondentReport (NOLOCK)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jody F Powlette SSC Rookie Points: 34 More actions · Answer 4

Here's the query plan. I don't think it adds much to what I've indicated, but you're the experts. See anything here? Thanks again

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 5

If the data is static, just take the results of the query and store them in a table, then query THAT table.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Jody F Powlette SSC Rookie Points: 34 More actions · Answer 6

That's what I would do and this data is static, but there's a couple other columns and I need to be able to slice it by those.. so basically this same query plus

"where Gender=0 and Age=1 and Site=123, etc".

and then the same query, grouped by all those variables, but now just a subset of rows. I just wanted to see if I could make the base-case of all rows go any faster, but it's sounding like that's not possible in t-sql.

There are only so many ways to slice the data though. Is it possible using some other tools or add-ons to pre-cache those results and store them for later? And Ideally recalculate those values automatically when the source data in this table changes? That would be ideal.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 7

Jody F Powlette (10/10/2009)
There are only so many ways to slice the data though. Is it possible using some other tools or add-ons to pre-cache those results and store them for later? And Ideally recalculate those values automatically when the source data in this table changes? That would be ideal.

SQL Server Analysis Services. What you're describing is an OLAP cube.

Do some reading up on analysis services, it's separate from SQL, is queried in different ways, is managed in different ways

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 8

Gail is right. However neither cubes nor summary tables in SQL get automatically updated when values are changed. You have to rebuild them. Typically jobs are scheduled to rebuild them periodically.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills