RE: SQL Refusing to use an index

SSCarpal Tunnel

Points: 4337

March 20, 2013 at 8:42 am

Yup, all tables had their statistics updated with FULLSCAN after migration, and the migration actually occurred 4 years ago and statistics have been updated frequently since then to ensure that they are as up to date as possible. I just reran it with FULLUPDATE as well just in case.

The query actually isn't that much more complex, it is literally selecting 5 fields from one table and 3 from the other using an INNER JOIN on the primary clustered key for the second table. Quite literally the query reads:

SELECT event.eventid, event.eventdate, event.chargediv, booking.bookingid, booking.type, booking.fee, booking.issubscription, booking.iscancelled

FROM event

INNER JOIN booking

ON event.eventid = booking.eventid

WHERE event.eventdate >= '2013-01-01'

event.eventid is a primary clustered key, as is booking.bookingid

an index exists on the booking table on the fields eventid, eventdate, bookingid, type, fee

another index exists on the booking table on the field eventid, eventdate

and one last one on event as the eventdate

By removing the issubscription and iscancelled fields the plan the server comes up with is an index scan on the first index, runs in 2 seconds and has 5700 logical reads and 5700 read aheads.

The plan is an index scan on event followed by a computer scaler, and index seek on booking, joined via an inner join nested loop. Both pipes are carrying around 3k-4k records, cpu and i/o costs are minimal.

Add the extra fields and rather than using the index on eventid and eventdate, it uses the index on eventid, eventdate, it scans the bookingid primary key (approximately 1, 800, 000 records), logical reads is not 64600 and read aheads is 64600 (approx), taking just over 20 minutes.

The plan now doesn't change with regards to the event, but the booking changes to a clustered index scan carrying all 1.8m records, joined via a hash match inner join. The I/O for the clustered scan and the cpu for the scan and hash match are very high.

The total number of rows returned is just under 4000.

Adding an hint to tell it to use the index drops the reads back and reduces the time back to 2 seconds.

As daft as it sounds, it is almost as though the optimiser sees the longer index as the best match until it spots the additional field requested, at which point it doesn't look for the shorter index, but instead falls back to it's scan, and we can replicate it time and again by adding and removing the field from the select. I've even dropped all indexes, recreated the ones needed from scratch, calculated the spread of data in the indexes and the statistics, and everything points to the right index - except the optimiser!

I'll keep digging, thanks for the thoughts though.

---------------------------------------
It is by caffeine alone I set my mind in motion.
It is by the Beans of Java that thoughts acquire speed,
the hands acquire shaking, the shaking becomes a warning.
It is by caffeine alone I set my mind in motion.