Why does this query cause index scan when the index is covering?

Question

Why does this query cause index scan when the index is covering?

programmersunlimited

SSC Rookie

Points: 46
More actions
June 3, 2011 at 1:50 pm

#251118

SELECT COUNT(*)
FROM Table1 WHERE Col1 = 0
vs
SELECT COUNT(*)
FROM Table1 WHERE Col1 = Table2.ID
The first query does a seek on Table1.Index1 but the second query does a scan on Table1.Index1.
Why? This query is part of an outer UPDATE statement for Table2.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

Scan vs seek has nothing to do with it being a covering index. Covering indexes prevent bookmark lookups.

The plan probably changes because of the stats on the join. Scans aren't always a bad thing. Sometimes, they're the best option, depending on the data and what you're doing with it.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 2

SELECT COUNT(*)

FROM Table1 WHERE Col1 = Table2.ID

That can't be seeked. There's no single value to be looked up. It could do one seek for each value of Table2.ID (that's what a nested loop join does), but SQL's probably decided it's too expensive and a single scan is faster.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

programmersunlimited SSC Rookie Points: 46 More actions · Answer 3

In this case the scan is taking the most time in this query which is taking the most time out of the entire group. Should I be looking else where instead of the index? Because there is a query exactly like this one just based on another column and it does a seek. I duplicated that index but using the desired column and it still does a scan. The two columns are the same so why would one do a scan but not the other?

UPDATE Table1

SET

Count1 =

(

SELECT CASE WHEN COUNT(*) = 0 THEN NULL ELSE COUNT(*) END

FROM Table2 (nolock)

WHERE Col1 = Table1.ID

)

,

Count2 =

(

SELECT CASE WHEN COUNT(*) = 0 THEN NULL ELSE COUNT(*) END

FROM Table2 with (nolock)

WHERE Col2 = Table1.ID

)

WHERE

timestamp BETWEEN @FromDate AND @ToDate

AND (Table1.ClientID = @ClientID)

Count1 query = seek on Table2.Index2

Count2 query = scan on Table2.Index1

Index1 and Index2 are the same except for the column used which is the same type

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

SELECT COUNT(*)
FROM Table1 WHERE Col1 = 0
vs
SELECT COUNT(*)
FROM Table1 WHERE Col1 = Table2.ID

Query 1:

Please count how many people there are in the telephone directory with a surname 'Van Westhuyzen'

Query 2:

Please take the list of several thousand people in a particular city and count how many people in the telephone directory have the same surname.

Very different requests. The first is easy to do with a seek. It's a SARGable expression (column = value)

The second is not easy to do and cannot be done with a single seek. It could be done with one seek for each and value in the one table, but it doesn't take much for that to be highly inefficient.

Edit: The initial query and the second query you posted are totally different with different predicates. Which is it?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

programmersunlimited SSC Rookie Points: 46 More actions · Answer 5

the second one is the entire query. What you say about the first post makes sense so what is going on with my second (entire) query?

Note: when I add a predicate to check for nulls I get a seek instead of a scan

WHERE Col2 is not null and Col2 = Table1.ID

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 6

Execution plan?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 7

Stats on how many NULLS are in Col2 and in Table1.ID?

In deciding to use an index, the optimizer is informed by both the sequence and the distribution of values in the index. If a significant percentage of the values are NULL, (and your query expressly excludes NULLS) the optimizer knows it can disregard those, and will do a seek to get to the beginning of the non-null values.... then "scan" from there. The "seek" is simply the positioning of the starting point at the first non-null value.

IX1

--------------------------

NULL

...

NULL

{-------- IS NOT NULL? Skip to here, and read the rest of the rows for matching values.

1

2

3

4

5

6

The same thing could happen if you had an index over (non-null) alphabetical values (ascending) and your query requested all matches where col1 > 'D'

Col1

---------------

A

...

B

...

C

...

D

....

<---------------------- Seek to here then scan forward for matches.

E

...

F

...

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

programmersunlimited SSC Rookie Points: 46 More actions · Answer 8

programmersunlimited

SSC Rookie

Points: 46

June 3, 2011 at 2:43 pm

#1333941

here ya go.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 9

The one that's seeking is not seeking for the join. It's just seeking for the where clause predicates

timestamp BETWEEN @FromDate AND @ToDate

AND (Table1.ClientID = @ClientID)

The join is done later as a hash join.

So what's the definition of the two indexes. I'll bet that Col2_Idx_Table2 does not have timestamp and ClientID as leading columns, hence that can't be done as a seek.

The joins, as I explained, aren't efficient as seeks. It's using a hash join which doesn't seek on the join columns (only efficient if there's a very small number in one resultset)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

programmersunlimited SSC Rookie Points: 46 More actions · Answer 10

programmersunlimited

SSC Rookie

Points: 46

June 3, 2011 at 3:30 pm

#1333958

re: The Dixie Flatline

2% are null.

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 11

WHERE Col2 is not null and Col2 = Table1.ID

Your join requires that every row of Table2 be tested. Your addition of NOT NULL restricts the set of rows that need to be retrieved from Table2 for testing. The query plan shows the scan of Col2_IX_Table2 produces 139,788,000 rows. If the query plan can use an effective index to skip over the nulls, then 2,795,760 rows (2%) don't have to be retrieved and thrown into the hash join. Without an applicable index, it has to scan all the rows and filter out nulls as it finds them.

Have a good weekend, y'all.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills