Index Behaviour

  • This was removed by the editor as SPAM

  • Koen Verbeeck (11/21/2014)


    Nice question, not so great explanation.

    Still wonder why number 1 is not a clustered index scan.

    Does SQL Server always prefer a covering nonclustered index over a clustered index?

    (still got it right though, by process of elimination)

    Agreed on the explanation.

    The NCI is chosen in this case because it is a smaller index and the record count has a bit to contribute as well. A CI could have been chosen had the number of columns been greater, the number of records been greater.

    Take the following rewrite with a completely fabricated table but still more realistic than the table in the question.

    Use Sandbox2

    go

    if object_id(N'Emp') is not null

    begin

    drop table Emp

    end

    go

    --create table Emp(Empid int identity(1,1),Fname char(100),Lname varchar(100))

    DECLARE @BeginDate DATE = '2014-10-01'

    ,@EndDate DATE = '2014-10-31'

    SELECT TOP 1000000

    EmpID = IDENTITY(INT,1,1),

    Blinky= 'GHOST ITEM ' + CONVERT(VARCHAR(20),ISNULL('', 0)),

    CandyMan= ABS(CHECKSUM(NEWID()))%50000+1,

    FName= CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    LName= CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeDate= DATEADD(DAY,RAND(CHECKSUM(NEWID())) * ( 1 + DATEDIFF(DAY, @EndDate,@BeginDate) ), @EndDate),

    BadData= RIGHT(NEWID(),12),

    Amount= CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY)

    INTO dbo.Emp

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 ;

    create clustered index pk_Emp_Empid on Emp(Empid)

    create nonclustered index IXNC_Emp_Fname_Lname on Emp(FName,LName)

    go

    In this case, we will see that the first and third queries resort to a CI scan instead of the index scan. More data, different data, more columns all contribute to the selection of the CI instead of the NCI. For a query with better data distribution the QO thinks it is less costly to do a CI scan so it will use that. In the original question, the cost of the NCI scan won out because the table had a single record and very few columns so the cost was tiny.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Just understood:

    a nonclustered index may be the same size but never larger than a clustered index, so, if it covers the data the nonclustered index will be first choice.

  • + Good question. Seemingly very simple, but makes you think.

    - I too wish the poster had provided a good explanation directly rather than pointing to another article that does not explain it at all.

    Hakim Ali
    www.sqlzen.com

  • Stewart "Arturius" Campbell (11/21/2014)


    Sean Lange (11/21/2014)


    Great question but the explanation and the linked article were terribly disappointing. The article says to use the table designer for creating indexes after it explains what a table is in sql server. Even worse, that article doesn't even pretend to go into the differences between index scans and seeks.

    If the explanation had any substance and the linked article was relevant to the question this would have been spectacular.

    Methinks Gail's series of articles on Intoduction to Indexes[/url] should have been consulted...

    +1

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Stewart "Arturius" Campbell (11/21/2014)


    Sean Lange (11/21/2014)


    Great question but the explanation and the linked article were terribly disappointing. The article says to use the table designer for creating indexes after it explains what a table is in sql server. Even worse, that article doesn't even pretend to go into the differences between index scans and seeks.

    If the explanation had any substance and the linked article was relevant to the question this would have been spectacular.

    Methinks Gail's series of articles on Intoduction to Indexes[/url] should have been consulted...

    Yes indeed!!! +1

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Stewart "Arturius" Campbell (11/21/2014)


    Methinks Gail's series of articles on Intoduction to Indexes[/url] should have been consulted...

    Very helpful, thanks for the pointer!

  • As the question is now all five use the nonclustered index. If the table had more data (lots more) then I could see the answer changing.

  • For those looking for more of an explanation:

    While poorly framed, it appears the intent of the question is to determine which queries will perform a SEEK versus a SCAN. And while this is a complex subject, in this simplified example, the answer is it will do a SEEK when the WHERE clause uses at least the first column of the index.

    The index is on (Fname, Lname) and the order the index is declared in is critical. A WHERE clause with only Lname can't use this index, however, a WHERE clause with only Fname, or a WHERE clause with both Fname and Lname can. (Note that the order in which the columns are listed in the WHERE clause does NOT matter, making queries 4 and 5 equivalent.)

    Seemingly accidentally, the question also brings up the question of when the optimizer will use a NONCLUSTERED versus a CLUSTERED index. Also a complex subject, but in this case, because the NONCLUSTERED index is a covering index for all queries, the optimizer always chooses it.

    A CLUSTERED index always includes all columns of a table, while the NONCLUSTERED index only includes its indexed columns (Fname, Lname), any INCLUDEd columns (none in this case) and the clustered index key (Empid), but not any remaining columns making it potentially smaller. In this case, there are no other columns, so the two indexes are equivalent and both contain the full table data. In general, a NONCLUSTERED index will be <= the CLUSTERED index in size, and so as long as it's a covering index (meaning it contains all data needed for the query, avoiding lookups into the clustered index) the optimizer will choose it over the clustered index as it should give equal-or-faster performance. (If the nonclustered index does not cover all fields needed for the query, this becomes far more complex as lookups to retrieve the additional columns are slow.)

  • NBSteve (11/21/2014)


    For those looking for more of an explanation:

    While poorly framed, it appears the intent of the question is to determine which queries will perform a SEEK versus a SCAN. And while this is a complex subject, in this simplified example, the answer is it will do a SEEK when the WHERE clause uses at least the first column of the index.

    The index is on (Fname, Lname) and the order the index is declared in is critical. A WHERE clause with only Lname can't use this index, however, a WHERE clause with only Fname, or a WHERE clause with both Fname and Lname can. (Note that the order in which the columns are listed in the WHERE clause does NOT matter, making queries 4 and 5 equivalent.)

    Seemingly accidentally, the question also brings up the question of when the optimizer will use a NONCLUSTERED versus a CLUSTERED index. Also a complex subject, but in this case, because the NONCLUSTERED index is a covering index for all queries, the optimizer always chooses it.

    A CLUSTERED index always includes all columns of a table, while the NONCLUSTERED index only includes its indexed columns (Fname, Lname), any INCLUDEd columns (none in this case) and the clustered index key (Empid), but not any remaining columns making it potentially smaller. In this case, there are no other columns, so the two indexes are equivalent and both contain the full table data. In general, a NONCLUSTERED index will be <= the CLUSTERED index in size, and so as long as it's a covering index (meaning it contains all data needed for the query, avoiding lookups into the clustered index) the optimizer will choose it over the clustered index as it should give equal-or-faster performance. (If the nonclustered index does not cover all fields needed for the query, this becomes far more complex as lookups to retrieve the additional columns are slow.)

    Good question, that forced me to re-think some "common" things. Thanks NBSteve for the clear explanation.

  • Stewart "Arturius" Campbell (11/21/2014)


    Sean Lange (11/21/2014)


    Great question but the explanation and the linked article were terribly disappointing. The article says to use the table designer for creating indexes after it explains what a table is in sql server. Even worse, that article doesn't even pretend to go into the differences between index scans and seeks.

    If the explanation had any substance and the linked article was relevant to the question this would have been spectacular.

    Methinks Gail's series of articles on Intoduction to Indexes[/url] should have been consulted...

    +1

    Tom

  • h.tobisch (11/21/2014)


    Just understood:

    a nonclustered index may be the same size but never larger than a clustered index, so, if it covers the data the nonclustered index will be first choice.

    I don't believe that that is correct. The non-clustered index can be larger (require more pages of storage) than the clustered index.

    In a case like the index here, the leaf level of the non-clustered index is the same size as the leaf level of the non-clustered index. If there are enough rows the non-leaf levels of the non-clustered index will be bigger than the non-leaf levels of the clustered index simply because the cluster key is smaller than the other index's key, so that the total size of the non-clustered index will be greater than the total size of the clustered index.

    Tom

  • Hi,

    I think the non clustred index it used in all the five queries.

    I have tested the example on SQL Server 2008 R2 and I have display the execution plan, that show sql server uses the non clustred index for the five queries.

    Thanks

  • Koen Verbeeck (11/21/2014)


    Nice question, not so great explanation.

    Still wonder why number 1 is not a clustered index scan.

    Does SQL Server always prefer a covering nonclustered index over a clustered index?

    (still got it right though, by process of elimination)

    + 1

    Thanks & Best Regards,
    Hany Helmy
    SQL Server Database Consultant

  • Toreador (11/21/2014)


    It was possible to work this out by elimination - either both or neither of 4 and 5 had to be there, which ruled out all but 1 and 1,3. Perhaps coincidentally, all the other options would fail with errors on my case sensitive database.

    If 1 was there then 3 had to be as well. If there'd been an answer of "none of them" then I may well have got it wrong.

    +1 Same here 🙂

    Thanks & Best Regards,
    Hany Helmy
    SQL Server Database Consultant

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply