Order of Indexed columns - does it matter

  • CREATE TABLE dbo.blah (
    CustomID INT IDENTITY(1,1)  NOT NULL,
    CustomName NVARCHAR(50) NOT NULL,
    GID INT NOT NULL,
    PID INT NOT NULL,
    IsActive BIT NOT NULL,
    CONTSTRAINT PK_CustomID PRIMARY KEY CLUSTERED (CustomID ASC)
    )

    CREATE NONCLUSTERED INDEX IX_Blah_GID_PID ON dbo.blah (GID ASC, PID ASC)

    Is it correct to think that the following SELECT will use the  nonclustered index? (I think it will):

    SELECT *
    FROM dbo.blah
    WHERE GID = 1 AND PID = 1

    Is it correct to think that the following SELECT will use the non clustered index? (I am not sure it will due to the columns in the WHERE clause not being in the same order as the non clustered index):

    SELECT *
    FROM dbo.blah
    WHERE PID = 1 AND GID = 1

    Is is correct to think that the following SELECT will use the non clustered index?  (I am not sure it will due to the columns in the WHERE clause not being in the same order as the non clustered index and adding another column to the WHERE):

    SELECT *
    FROM dbo.blah
    WHERE PID = 1 AND GID = 1 AND IsActive = 1

    I have been Googling this and I believe that SELECTs 2 and 3 will not use the index.  Then I read something that said if the WHERE clause has all of indexed fields in it, no matter the order, it will be used.

    Thank you

  • My advice is not to trust what others tell you. Just do a simple test and review the behavior using the execution plans.
    First, let's create some sample data:

    DECLARE @words TABLE (
      word   nvarchar(100) NOT NULL,
      PRIMARY KEY CLUSTERED (word)
    );

    WITH cteStrings AS (
      SELECT TOP (500) message_id, [text]
      FROM sys.messages
      WHERE language_id=1033
      ORDER BY NEWID()
    )
    --PatternSplitCM code can be found here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/
    INSERT INTO @words
    SELECT DISTINCT LOWER(s.Item)
    FROM cteStrings w
    CROSS APPLY dbo.PatternSplitCM( w.[text], N'%[A-Za-z]%') s
    WHERE LEN(s.Item)>=3
    AND s.Matched = 1;

    INSERT INTO dbo.blah(
      CustomID,
      GID,
      PID,
      IsActive
    )
    SELECT TOP (1000)
      UPPER(LEFT(a.word, 1))+SUBSTRING(a.word, 2, LEN(a.word))+
      ISNULL(N' '+b.word, '') AS CustomName,
      ABS(CHECKSUM(NEWID()))%100 AS GID,
      ABS(CHECKSUM(NEWID()))%100 AS PID,
      ABS(CHECKSUM(NEWID()))%3 AS IsActive
    --- First word:
    FROM @words AS a
    --- Second word:
    OUTER APPLY (
      SELECT TOP (1)
        UPPER(LEFT(x.word, 1))+
        SUBSTRING(x.word, 2, LEN(x.word)) AS word
      FROM @words AS x
      WHERE a.word <> x.word
      ORDER BY NEWID()
      ) AS b

    ORDER BY NEWID();

    Now, just run the queries that you posted with the option "Include Actual Execution Plan". If you do that, you'll see how the index is used.

    To know more about execution plans, there's a nice book on execution plans in this site if you need help understanding them. http://www.sqlservercentral.com/articles/books/94937/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks Luis!

    All three of the queries I provided Perform an Index Seek on the non clustered index and a key lookup, so they do use the Index and perform a seek operation.

    Running two more queries:

    SELECT *
    FROM dbo.blah
    WHERE GID = 1

    SELECT *
    FROM dbo.blah
    WHERE PID = 1

    They both perform a Clustered Index Scan on the clustered index.  For some reason I was thinking that the first query (WHERE GID = 1) would use the non clustered index as the WHERE clause is using the first column of the index, GID.

    Can you enlighten me as to why this would be?  Is because the order of the columns in the index does not matter?  If so, when googling this, why is there so many articles that make it seem the order does matter.

    Thank You!!

  • GBeezy - Wednesday, October 11, 2017 11:08 AM

    Thanks Luis!

    All three of the queries I provided Perform an Index Seek on the non clustered index and a key lookup, so they do use the Index and perform a seek operation.

    Running two more queries:

    SELECT *
    FROM dbo.blah
    WHERE GID = 1

    SELECT *
    FROM dbo.blah
    WHERE PID = 1

    They both perform a Clustered Index Scan on the clustered index.  For some reason I was thinking that the first query (WHERE GID = 1) would use the non clustered index as the WHERE clause is using the first column of the index, GID.

    Can you enlighten me as to why this would be?  Is because the order of the columns in the index does not matter?  If so, when googling this, why is there so many articles that make it seem the order does matter.

    Thank You!!

    Actually, it's not guaranteed that it will use one or the other. Based on the index statistics, it will determine a cost for the use of either index. When the filter is not selective enough, it might choose to read the whole clustered index. When it's selective enough, it might read the nonclustered index first and then go directly to the rows on the clustered index to obtain the information that is not included in the nonclustered index. You can compare costs if you force SQL Server to include the index and you could "convince" SQL Server to scan the nonclustered index if all the columns are contained in it and it's smaller than the clustered index (as it usually happens).
    Compare Estimated Subtree Cost on the SELECT operation for the execution plans of these queries:

    SELECT *
    FROM dbo.blah
    WHERE GID = 1;

    SELECT *
    FROM dbo.blah WITH(INDEX= IX_Blah_GID_PID)
    WHERE GID = 1;

    Now check how the plans change when you use less columns:

    SELECT CustomID,
      GID,
      PID
    FROM dbo.blah
    WHERE GID = 1;

    SELECT CustomID,
      GID,
      PID
    FROM dbo.blah
    WHERE PID = 1;

    Now let's suppose that you actually need CustomID and CustomName columns most of the time.

    SELECT CustomID,
      CustomName
    FROM dbo.blah
    WHERE GID = 1;

    This query will get back to the clustered index scan. But if we recreate the index to include the missing column, we can see the plan going back to an index seek.

    DROP INDEX IX_Blah_GID_PID ON dbo.blah ;
    CREATE NONCLUSTERED INDEX IX_Blah_GID_PID ON dbo.blah (GID ASC, PID ASC) INCLUDE (CustomName);

    SELECT CustomID,
      CustomName
    FROM dbo.blah
    WHERE GID = 1;

    Hopefully, this will give you a better understanding on what's happening with the indexes when querying. Come back if you still have questions.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • You'll likely also see it go back to an index seek on the non clustered index if you increase the uniqueness of GID.  Trying change the values for GID and PID to %10000 instead of %100 on the insert that Luis gave you and see what happens.

  • This has been great Luis!!  I have some more research to do on my end, I really appreciate the insight!!

  • GBeezy - Wednesday, October 11, 2017 11:08 AM

    SELECT *
    FROM dbo.blah
    WHERE GID = 1

    SELECT *
    FROM dbo.blah
    WHERE PID = 1

    They both perform a Clustered Index Scan on the clustered index.  For some reason I was thinking that the first query (WHERE GID = 1) would use the non clustered index as the WHERE clause is using the first column of the index, GID.

    Can you enlighten me as to why this would be?  Is because the order of the columns in the index does not matter?

    The order of columns in an index certainly does matter for how it can be used to resolve a query.  The index you specified can never be used to resolve your second query here since PID is the second column in the index.  For the first query, the problem may be the SELECT * part of it.  If the first query were:
    SELECT GID FROM dbo.blah WHERE GID = 1
    it would more easily be able to use the index, but as written, the first query would require the engine to find all the records GID=1 in the index, then do a LOOKUP operation back to the HEAP to get the rest of the columns of the table.

    If you want to learn more about how all this works, check out an interesting and educational free series of videos from Brent Ozar called "How to think like the SQL Server engine"
    https://www.brentozar.com/training/think-like-sql-server-engine/

  • Also, check the execution count of the INDEX SEEKS in the execution plan.  If there's one for every row, that's a very bad thing performance wise.

    Last but not least, always to a performance check by measuring the performance of the code in terms of CPU, Duration, Reads, Writes, and compile time.  While execution plans are incredibly helpful in seeing what is being used, they're rather poor for numerically determining which code is actually the fastest because even actual execution plans are full of estimates.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply