Clustered Index Scan

Question

Clustered Index Scan

bebe022476

SSC Enthusiast

Points: 146
More actions
January 20, 2016 at 2:30 pm

#307904

How to optimise the following query, it's doing a Clustered Index Scan (Clustered).
select document.DocumentID as 'document.document.id',
document.DocFolderID as 'document.document.folderID',
document.FileName as 'document.document.description',
document.FileSize as 'document.document.fileSize',
document.ModifiedDate as 'document.document.modifiedDate',
document.TableName as 'document.document.tableName',
document.EntityCode as 'document.document.entityCode'
from Document where (DocFolderID = 1) and (EntityCode is null OR (TableName='Employee' AND EntityCode='xxxxxx')) AND PublishToWeb='1'
(12 row(s) affected)
Table 'Document'. Scan count 1, logical reads 2842, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

bebe022476 (1/20/2016)
How to optimise the following query, it's doing a Clustered Index Scan (Clustered).
select document.DocumentID as 'document.document.id',
document.DocFolderID as 'document.document.folderID',
document.FileName as 'document.document.description',
document.FileSize as 'document.document.fileSize',
document.ModifiedDate as 'document.document.modifiedDate',
document.TableName as 'document.document.tableName',
document.EntityCode as 'document.document.entityCode'
from Document where (DocFolderID = 1) and (EntityCode is null OR (TableName='Employee' AND EntityCode='xxxxxx')) AND PublishToWeb='1'
(12 row(s) affected)
Table 'Document'. Scan count 1, logical reads 2842, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

1) ORs are very difficult for the optimizer to get right. Consider a compound statistic on the columns involved.

2) Much more importantly, why do you think the optimizer should NOT be doing a table scan? That can very well be the most efficient way to access a given table.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 2

Table definition, index definitions and execution plan please.

But first, is the query really a problem?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 3

bebe022476 (1/20/2016)
How to optimise the following query, it's doing a Clustered Index Scan (Clustered).
select document.DocumentID as 'document.document.id',
document.DocFolderID as 'document.document.folderID',
document.FileName as 'document.document.description',
document.FileSize as 'document.document.fileSize',
document.ModifiedDate as 'document.document.modifiedDate',
document.TableName as 'document.document.tableName',
document.EntityCode as 'document.document.entityCode'
from Document where (DocFolderID = 1) and (EntityCode is null OR (TableName='Employee' AND EntityCode='xxxxxx')) AND PublishToWeb='1'
(12 row(s) affected)
Table 'Document'. Scan count 1, logical reads 2842, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

What is the problem?

😎

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 4

GilaMonster (1/20/2016)
Table definition, index definitions and execution plan please.
But first, is the query really a problem?

Sorry Gail, didn't see your post, makes mine look kind of daft:-P

😎

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 5

Here's my standard pattern for optimizing a query that needs OR. It might give you two index seeks instead of an index scan, but only if there are usable indexes. And then it might still not be faster - sometimes a scan is the fastest way to get what you need. And less than 3000 logical reads is usually not a big deal.

SELECT bla bla

FROM Document

WHERE DocFolderID = 1

AND PublishToWeb='1'

AND EntityCode IS NULL

UNION ALL

SELECT bla bla

FROM Document

WHERE DocFolderID = 1

AND PublishToWeb='1'

AND TableName='Employee'

AND EntityCode='xxxxxx'

-- AND EntityCode IS NOT NULL -- Not needed in this case because of EntityCode = 'xxxxx', but required if conditions are not mutually exclusive

;

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

bebe022476 SSC Enthusiast Points: 146 More actions · Answer 6

HI Gail,

Here it is:

DocumentID(PK,int,not null)

DocFolderID(int,null)

PublishToWeb(bit,null)

Filename(varchar(100),null)

Filesize(int,null)

ModifiedDate(datetime,null)

TableName(varchar(40),null)

EntityCode(varchar(12),null)

I'm not sure how to attached the jpg screenshot here but the execution plan only has 100% Clustered Index Scan(Clustered) on [document],[pk_document] .

The job normally runs for 2 hours and since looking at optimising the job, this is only one of the expensive queries in the job.

Thanks for your help

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

You are doing < 3000 page reads with this query. You could do that at floppy-disk speed (which many of my clients actually have for their IO performance!!) and STILL finish the query in maybe a few tens of seconds. This job of yours may have lots of problems but I cannot see how this one could be one of them - unless it is being blocked I suppose.

Use sp_whoisactive to see real-time performance and issues such as long-duration blocking.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 8

bebe022476 (1/20/2016)
HI Gail,
Here it is:
DocumentID(PK,int,not null)
DocFolderID(int,null)
PublishToWeb(bit,null)
Filename(varchar(100),null)
Filesize(int,null)
ModifiedDate(datetime,null)
TableName(varchar(40),null)
EntityCode(varchar(12),null)
I'm not sure how to attached the jpg screenshot here but the execution plan only has 100% Clustered Index Scan(Clustered) on [document],[pk_document] .
The job normally runs for 2 hours and since looking at optimising the job, this is only one of the expensive queries in the job.
Thanks for your help

Don't bother attaching the screen capture. That's a waste of time. Instead, right click on the plan and select "Save As." You can then save the plan as a .sqlplan file. That can be attached and uploaded here. When working with execution plans, the picture is nice, but the interesting pieces of information are in the properties of all the operators. That's why the plan is more important than the picture.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 9

bebe022476 (1/20/2016)
HI Gail,
Here it is:
DocumentID(PK,int,not null)
DocFolderID(int,null)
PublishToWeb(bit,null)
Filename(varchar(100),null)
Filesize(int,null)
ModifiedDate(datetime,null)
TableName(varchar(40),null)
EntityCode(varchar(12),null)
I'm not sure how to attached the jpg screenshot here but the execution plan only has 100% Clustered Index Scan(Clustered) on [document],[pk_document] .
The job normally runs for 2 hours and since looking at optimising the job, this is only one of the expensive queries in the job.
Thanks for your help

Grant already replied with instructions on how to attach the execution plan. If possible, use an actual execution plan as this contains more useful information than an estimated execution plan.

Have you ever tried just executing this query manually from a query window to see how long it runs? Like Kevin, I have a hard time imagining how a query that process just a few thousand pages would run for hours.

For optimization purposes, please provide more information about the table - you posted a column list, but we definitely need to know all the indexes that are defined on the table as well. If you have any CHECK-, UNIQUE-, or FOREIGN KEY-constraints, share them too.

You mention that the Clustered Index Scan is 100% - I assume you are talking about the percentages displayed in the execution plan? Please stop trusting them. They are at best a loose indication based on abstractions and assumptions that are outdated by two decades. Plus, those numbers are based on estimated row counts (even in an actual execution plan), so if the estimates are wrong they mean even less than they already do.

The same goes for the percentage-cost of this plan in relation to the entire batch, if you looked at that (which I think you did, based on your statement that this is one of the expensive queries). This is also based on estimated costs, so goes back to the same outdated costing model and the same caveats about incorrect estimated row counts.

The best way to really find out which query or queries in a batch or job are causing slowness are to have an extended events session capture relevant events while the job runs. Second best is to manually execute the statements one by one and track how long each one takes, but that is not always possible (and sometimes differences between the settings of SSMS vs SQL Agent or differences in concurrent workload while the queries are running make the results of such a test completely different from the actual run.

Yet another way could be to create a logging table and add INSERT statements that add a row in that table to track progress - make sure to include a datestamp column so that you can later verify how long each step took. This will not work if the job fails and has to roll back, because then the inserts into the logging table will also be rolled back.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/