Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Wrong Index Being Used Expand / Collapse
Author
Message
Posted Friday, February 8, 2013 12:21 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 9:02 PM
Points: 20,734, Visits: 32,515
Brandie Tarvin (2/8/2013)
Lynn Pettis (2/8/2013)
Roger Sabin (2/8/2013)
I understand what's been said.

But I still don't see why the first index is more "correct" when it requires trolling through thousands of records to find just the few for an account when the second index goes directly to the few records for the account.

Doesn't the equality comparison on the AccountNumber column make the second index a better choice than the first index?


You are doing an equality comparision on AccountNumber on both indexes. SQL Server is doing a seek on both indexes. For your second query, if SQL Server used the nonclustered index, it would then have to do a bookmark lookup to the table (in this case the clustered index) to get the rest of the data to satisfy the query. Instead it chose to do an index seek on the clustered index as this satisfies the query.


Lynn, for my own edification... This is because Name isn't included in either index, right? So the Clustered Index is easier to use to find name as associated with the rest of the data that actually is in the index?


Think of it, in this case, as a covering index since all columns not part of the index are also included by definition. Yes, if Name had been a part of the nonclustered index, perhaps as suggested an included column, then SQL Server would have used that index instead. It is much smalleer than the clustered index (fewer columns, more "data" per page, etc).



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1417870
Posted Friday, February 8, 2013 12:22 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, September 30, 2014 7:42 AM
Points: 3,688, Visits: 72,435
Brandie Tarvin (2/8/2013)


Lynn, for my own edification... This is because Name isn't included in either index, right? So the Clustered Index is easier to use to find name as associated with the rest of the data that actually is in the index?


All fields are always part of the clustered index. If you don't specify a field as part of the clustering key, it's still included in the clustered index because the clustered index is the table itself.




--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link
For tips on how to post your problems
Post #1417871
Posted Friday, February 8, 2013 12:37 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: 2 days ago @ 11:40 AM
Points: 34, Visits: 734
So, you are saying that SQL Server determined --

the cost of finding a few records (using second index) + lookup for other fields

was greater than ---

the cost of searching through thousands of records (using first index) to find the few matching that where clause but without a need to do a lookup for the additional fields needed

and that is why the first index was chosen.

I always thought that the index with the greatest number of equality matches (in the order of the columns in the index without any gaps) was always selected. Only when there were range matches or two or more indexes could satisfy the equality matches would it take into account things like lookups.
Post #1417877
Posted Friday, February 8, 2013 12:46 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, October 21, 2014 7:28 AM
Points: 5,584, Visits: 6,380
Roger,

If I understand things correctly, it isn't about equality of the Account Number match. It's about you asking the engine for more information than the nonclustered index could easily supply. The engine had a choice, do 2 searches with the nonclustered index or do 1 search with the clustered index.

Since the clustered index is physically sorted by the order of the index-specified columns, the engine knew where to find all relevant information tied to the account number. It didn't have to grab a pointer and then search all the relevant levels of the index for the additional information.

You are correct that the nonclustered index is faster when no other information (save the columns listed in the index) has been requested.



Brandie Tarvin, MCITP Database Administrator

Webpage: http://www.BrandieTarvin.net
LiveJournal Blog: http://brandietarvin.livejournal.com/
On LinkedIn!, Google+, and Twitter.

Freelance Writer: Shadowrun
Latchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.
Post #1417879
Posted Friday, February 8, 2013 12:48 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, September 30, 2014 7:42 AM
Points: 3,688, Visits: 72,435
Roger Sabin (2/8/2013)
So, you are saying that SQL Server determined --

the cost of finding a few records (using second index) + lookup for other fields

was greater than ---

the cost of searching through thousands of records (using first index) to find the few matching that where clause but without a need to do a lookup for the additional fields needed

and that is why the first index was chosen.

I always thought that the index with the greatest number of equality matches (in the order of the columns in the index without any gaps) was always selected. Only when there were range matches or two or more indexes could satisfy the equality matches would it take into account things like lookups.


It didn't search through thousands of records, the clustered index was selective enough that it performed a Clustered Index Seek, not a Clustered Index Scan and found the records and didn't then have to perform a second Clustered index seek and then combine the results in order to get the full data. If you want it to use the non-clustered index, make the non-clustered index include the columns, otherwise the clustered index seek operation is more efficient.




--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link
For tips on how to post your problems
Post #1417880
Posted Friday, February 8, 2013 3:37 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 9:02 PM
Points: 20,734, Visits: 32,515
Roger Sabin (2/8/2013)
So, you are saying that SQL Server determined --

the cost of finding a few records (using second index) + lookup for other fields

was greater than ---

the cost of searching through thousands of records (using first index) to find the few matching that where clause but without a need to do a lookup for the additional fields needed

and that is why the first index was chosen.

I always thought that the index with the greatest number of equality matches (in the order of the columns in the index without any gaps) was always selected. Only when there were range matches or two or more indexes could satisfy the equality matches would it take into account things like lookups.



Look at your execution plans, SQL Server performed an INDEX SEEK on both of them to satisify the respective queries. SQL Server did not have to scan the clustered index to find the data to satisify the query that also returned the value from the CompanyName column (if I remembered the column correctly).




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1417933
Posted Friday, February 8, 2013 4:53 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: 2 days ago @ 11:40 AM
Points: 34, Visits: 734
I understand that using the first index (the clustered one) allows SQL Server to retrieve any column in the record once it has found the record it wants (using the where clause). Whereas, the second index requires a separate lookup to get any column not in the index which does take more time.

I am having a hard time understanding why the first index would be used when it requires tremediously more IO and time than the second index to return the same records. Why isn't it obvious to the query optimizer that the first index costs more than the second index?

For example, executing the following queries (the first one uses the clustered index, the second one uses the second index, the third one is being forced to use the second index though it has a non-indexed column):

set statistics io on
set statistics time on

select ProcessYear,
ProcessMonth,
VisionAccountNumber,
CustomerName
from UN_MonthlyCMS
where ProcessYear = 2013
and ProcessMonth = 1
and VisionAccountNumber = '1234'

select ProcessYear,
ProcessMonth,
VisionAccountNumber
from UN_MonthlyCMS
where ProcessYear = 2013
and ProcessMonth = 1
and VisionAccountNumber = '1234'

select ProcessYear,
ProcessMonth,
VisionAccountNumber,
CustomerName
from UN_MonthlyCMS with (index (IDX_MonthlyCMS_ProcessYear_ProcessMonth_VisionAccountNumber_CreditPlanNumber))
where ProcessYear = 2013
and ProcessMonth = 1
and VisionAccountNumber = '1234'

set statistics time off
set statistics io off

you get this:

SQL Server parse and compile time: 
CPU time = 0 ms, elapsed time = 1 ms.

(0 row(s) affected)
Table 'UN_MonthlyCMS'. Scan count 1, logical reads 355528, physical reads 7, read-ahead reads 216495, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 2312 ms, elapsed time = 147795 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

(0 row(s) affected)
Table 'UN_MonthlyCMS'. Scan count 1, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 35 ms.

(0 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'UN_MonthlyCMS'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

Though these queries are using a bogus account number so that no actual records are found, when I ran them with a real account number I got similar IO usage and times.
Post #1417955
Posted Friday, February 8, 2013 6:29 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, October 17, 2014 8:13 AM
Points: 9,926, Visits: 11,188
Roger Sabin (2/8/2013)
I am having a hard time understanding why the first index would be used when it requires tremendously more IO and time than the second index to return the same records. Why isn't it obvious to the query optimizer that the first index costs more than the second index?

There are a number of reasons.

First up, where there is an obvious covering nonclustered index for the query, the optimizer picks it straight away at the trivial plan stage. The plan is chosen before the optimizer starts exploring the space of possible plans and choosing one based on lowest estimated cost. You can see this in the SSMS execution plan view by clicking on the SELECT icon and looking at the Optimization Level property in the Properties window - it shows TRIVIAL for the CorrectIndex.sqlplan file.

For the second query, there is no longer an obviously best choice, and although the trivial plan logic is still run, it does not find a plan. So, the optimizer starts generating plan alternatives and costing them. One alternative is to seek the clustered index:

ProcessYear = 2013
ProcessMonth = 1

...and then apply a residual filter to rows that match:

VisionAccountNumber = '1234'

The alternative is to seek the nonclustered index:

ProcessYear = 2013
ProcessMonth = 1
VisionAccountNumber = '1234'

...and then use the clustered index keys stored in the leaf level of the nonclustered index to locate the base row in the clustered index and return the value it finds for CustomerName. This process repeats for each row found by the nonclustered index seek.

These two alternatives are assigned an estimated cost using the query optimizer's cost model, and this is where the problem occurs. The model is just a model, of course, one which happens to work pretty well in practice over the broad range of queries, database designs, and hardware in use across the world. It does not look to make a precise estimate of the cost of the query in your particular circumstances, and the cost it produces is also dependant on the quality of statistical information it has to work with.

Now, without scripted statistics for the table in question I cannot work through the numbers and show you precisely why the clustered index seek + residual predicate is chosen over the nonclustered seek + base table lookups, but if you compare the estimated plan costs for both (using an index hint as necessary) you will see the clustered access path appears to be cheaper.

One reason for this is that the optimizer assumes each query starts with a cold data cache, so all I/Os will at least initially be physical. The model also assumes sequential I/O (scanning a range of the clustered index) will be much faster than the random I/O expected to result from a given number of lookups from the nonclustered index to the base table.

The main reason though is that the optimizer is grossly underestimating the number of rows returned by the range seek of the clustered index which will need to have the residual filter applied. This may be due to a lack of good statistics on the column combination concerned, or it may be a consequence of the way predicates are modelled. You can check this by issuing a query like SELECT COUNT_BIG(*) FROM UNMonthly_CMS WHERE ProcessYear = 2012 AND ProcessMonth = 1 OPTION (RECOMPILE); the estimated number of rows ought to be quite close to the actual rows.

If you fancy scripting out the indexes and statistics using SSMS, I can go into more detail.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1417964
Posted Friday, February 8, 2013 6:46 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 9:02 PM
Points: 20,734, Visits: 32,515
At this point, I will bow out and defer to Paul. He has spent much more time than I digging into these things than I and has a much better understanding of indexing and the optimizer.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1417968
Posted Friday, February 8, 2013 6:56 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, October 17, 2014 8:13 AM
Points: 9,926, Visits: 11,188
Lynn Pettis (2/8/2013)
At this point, I will bow out and defer to Paul. He has spent much more time than I digging into these things than I and has a much better understanding of indexing and the optimizer.

I feel I should stress that pretty much everything you guys have said on this thread so far has been accurate; the problem we are dealing with here is the optimizer making what looks like the right choice from its point of view, but which turns out to be highly suboptimal because it doesn't have good information to base its choices on, and the model it uses is a very general-purpose affair. The obvious solution has already been suggested - provide an obvious best-choice (covering) index. The alternative is to create better statistics, or to accept the out-of-model condition and use an index hint. So, overall, I'm just adding extra analysis around the points already made in case it is of interest to Roger.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1417970
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse