Slow running query

Question

Slow running query

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Alexander Suprun SSCertifiable Points: 6233 More actions · Answer 1

It's actually quite strange why optimizer decides to scan the whole index instead of performing few hundred seeks of the same index. May be statistics is not updated or the quality of it is very bad.

Suggested index won't eliminate the scan of a lot of records, and even though it will appear as "index seek" in the execution plan, the seek most likely will be done by this condition only - "fkEventType=26". So basically "fkEventType=26" will migrate from "Predicate" to "Seek Predicate" of the index operation, it will reduce IO cost but it still may return 90 million of records (if estimate is correct), then try to aggregate them and hash match, which is still bad for performance because of high CPU usage.

I would suggest you to try this rewritten query first, which should use the existing index but in smarter way. I don't have SQL2000 so I could not test it. I hope it will work:

SELECT sample.accn_# + sample.samplenumber as 'Specimen Number',

min(tbleventSample.eventdate) as 'Contained Date',

WorkstationLocation.LocationCode as 'Location Code',

WorkstationLocation.LocationName as 'Location Name',

Mac.WorkstationID as 'Workstation Code',

Mac.Machine_Name as 'WorkStation Name',pksample

FROM tbl_container

INNER join tbl_containeritems

on tbl_containeritems.fkContainer= tbl_container.pkcontainer

INNER join tbleventSample

on tbleventSample.pkeventsample = ( SELECT TOP 1 pkeventsample

FROM tbleventsample

WHERE fkeventtype = 26 and fksample = tbl_containeritems.parentpkvalue

ORDER BY pkeventsample DESC)

INNER JOIN Mac

on tbleventSample.fkWorkStationid = Mac.WorkstationID

INNER join WorkstationLocation

ON WorkstationLocation.LocationID = Mac.LocationID

INNER join sample

on sample.pksample = tbl_containeritems.parentpkvalue

WHERE tbl_container.ContainerNo = 'S0000830615' And tbl_container.Active = 1 and tbl_containeritems.active = 1

GROUP BY sample.accn_# + sample.samplenumber,WorkstationLocation.LocationCode,

WorkstationLocation.LocationName,Mac.WorkstationID,Mac.Machine_Name,pksample

Alex Suprun

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

Alexander Suprun (1/20/2016)
It's actually quite strange why optimizer decides to scan the whole index instead of performing few hundred seeks of the same index.

The keys are in the wrong order. You want filters first, to reduce the rows that need to be read, and the join second (because only nested loop will seek as part of a join). The existing index is the other way around, hence a scan because the join is a hash join and that won't do index seeks to the inner table on the join columns. Should also allow for a stream aggregate, since once the filter is done, the rows are sorted by the group by column.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Alexander Suprun SSCertifiable Points: 6233 More actions · Answer 3

GilaMonster (1/21/2016)
The keys are in the wrong order. You want filters first, to reduce the rows that need to be read, and the join second (because only nested loop will seek as part of a join). The existing index is the other way around, hence a scan because the join is a hash join and that won't do index seeks to the inner table on the join columns. Should also allow for a stream aggregate, since once the filter is done, the rows are sorted by the group by column.

Gail, with all due respect you are mistaken on that one.

Basically there are 2 ways to execute this query. I'm talking now only about aggregating tbleventsample and joining it with tbl_containeritems, because this is the most expensive part in the plan. So options are:

[h5]Option 1. Scan/seek tbleventsample, filter "fkEventType = 26" (89 million records), aggregate by fksample, hash join with tbl_containeritems.[/h5]The existing index is on (fkSample, fkEventType) and it's already doing a stream aggregate to group by the records because the records are actually sorted by fkSample. And even if you need to eliminate "fkEventType <> 26" your intermediate result-set still will be sorted by fkSample, so you can do aggregate on existing index.

Please see the example:

If we create another index (fkEventType, fkSample), then the number of records we physically have to read will drop down from 487 million to 89 million. It's a good improvement, but we still need to "scan" about 18% of the table, 89 million is still a huge number of records that later have to be processed (aggregate and hash join). Plus it will cost us an extra index an a big table.

[h5]Option 2. Get tbl_containeritems records (133 according to estimate), nested loop join with tbleventsample using fkEventType and fksample, and aggregate by fksample.[/h5]For this approach there is no difference in order of the key columns in the index. Does it matter that one value is a constant and second is from another table? Absolutely not. (fkEventType, fkSample) or (fkSample, fkEventType) server can still perform an index seek. And the difference between using these 2 indexes should be negligible. The only thing I can think of is the would be the higher probability of hitting the same page twice in the index where 1st column is less selective (fkEventType). This will reduce the number of physical reads slightly but it will make no significant difference when doing only 133 seeks.

So we don't have to create a new index because SQL Server should run an index seek on existing one. Why it's not doing this? Honestly, I have no idea because I have not been working with SQL2000 for a long time. May be optimizer was different in that one. That's why I suggested to re-write the query in order to force the optimizer to use nested loop + index seek.

Alex Suprun