• manole_i (2/12/2016)


    You are right, the index is not covered and it does a nested loop join with a key lookup.

    With forceseek I got this:

    Scan count 1, logical reads 6046852, physical reads 0, read-ahead reads 570, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    without:

    Scan count 1, logical reads 2319106, physical reads 694, read-ahead reads 2315361, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    THere are twi interesting things I see in those results.

    First - the number of logical reads for the query with forceseek is 2.5 times the number of logical reads for the unhinted query. So I understand why the optimizer makes its choice: if the estimated rowcounts are correct, then the QO expected those IO amounts. Also, because the scan is sequential read-ahead optimization can be used (and is used, as shown in the results).

    Second - the forceseek plan does logical IO only and the unhinted plan does a lot of physical IO. This is not taken into account for the plan choice (because the QO does not know what is and what isn't in the buffer pool, and because this could be different when the plan is later reused), but it could be an explanation of why the cheaper plan ended up taking more time than the more expensive plan.

    If you want to dig deeper into this, then you will have to look at what other activities in your workload cause this difference. What surprises me most is that there are no phyiscal reads at all for the forceseek plan - an obvious explanation would be when the nonclustered index is in the buffer pool and the clustered index is not, but then the lookups would incur physical reads. Now it looks as if exactly those pages it needs from the clustered index are in the buffer pool, but the rest of the clustered index is not in the buffer pool.

    Did you perhaps run the forceseek query once without measuring, then run it again with the measurement and then run the unhinted version?

    For a fair comparison, always either forcefully flush the buffer pool before running each query (by using DBCC DROPCLEANBUFFERS - but do not do that on a production server!!), or ensure that each query can employ the buffer pool fully (by doing two executions of the same query in a row and measuring only the second),

    Oh, of course - if there is something going on as a regular process in your workload that causes these weird differences in buffer pool usage, then you may end up deciding that you prefer the more expensive query. But for now, I assume that the way you ran your tests has influenced the results.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/