Index Column Order – Be Happy!

  • Mike Byrd

    Ten Centuries

    Points: 1202

    Comments posted to this topic are about the item Index Column Order – Be Happy!

    Mike Byrd

  • Thomas Franz

    Hall of Fame

    Points: 3565

    I miss one important version:
    If you regularly filter for Status = 5 (and not 3 or 4 or 6 or anything else) and OrderDate BETWEEN x and y, the best solution would be a filtered index:

    CREATE NONCLUSTERED INDEX IX_SalesOrderHeaderBig_OrderDate_Status5
         ON Sales.SalesOrderHeaderBig (OrderDate)
    INCLUDE (SalesPersonId, SubTotal, Status) -- Status has to been included, even if it is the filter column (otherwise you will have key lookups)
      WHERE Status = 5
       WITH (DATA_COMPRESSION = PAGE) -- Since there are usually many Orders at the same day, PAGE compression would be very effective

    This way  you would  decrease the index size (depending on the distribution of the stati) and have much better statistics (since they are created always on the first column in the index)

    God is real, unless declared integer.

  • Matt Wright

    SSC Enthusiast

    Points: 162

    What is the table definition? I was at a Brent Ozar session at PASS 2018 and they said the column order in index recommendations is based on the order they exist in the table and nothing else. So in your case, I'd have to assume that the index recommendation was just luckily in the right order.

    I'd be interested to see other examples where the index recommendation is actually correct on purpose as a learning exercise.



    I am Melvis.

  • Mike Byrd

    Ten Centuries

    Points: 1202

    Table definition is
    CREATE TABLE Sales.SalesOrderHeaderBig(
     SalesOrderID int IDENTITY(1,1) NOT NULL,
     RevisionNumber tinyint NOT NULL,
     OrderDate datetime NOT NULL,
     DueDate datetime NOT NULL,
     ShipDate datetime NULL,
     [Status] tinyint NOT NULL,
     OnlineOrderFlag dbo.Flag NOT NULL,
     SalesOrderNumber  AS (isnull(N'SO'+CONVERT(nvarchar(23),SalesOrderID,(0)),N'*** ERROR ***')),
     PurchaseOrderNumber dbo.OrderNumber NULL,
     AccountNumber dbo.AccountNumber NULL,
     CustomerID int NOT NULL,
     SalesPersonID int NULL,
     TerritoryID int NULL,
     BillToAddressID int NOT NULL,
     ShipToAddressID int NOT NULL,
     ShipMethodID int NOT NULL,
     CreditCardID int NULL,
     CreditCardApprovalCode varchar(15) NULL,
     CurrencyRateID int NULL,
     SubTotal money NOT NULL,
     TaxAmt money NOT NULL,
     Freight money NOT NULL,
     TotalDue  AS (isnull((SubTotal+TaxAmt)+Freight,(0))),
     Comment nvarchar(128) NULL,
     rowguid uniqueidentifier ROWGUIDCOL  NOT NULL,
     ModifiedDate datetime NOT NULL,
     CONSTRAINT PK_SalesOrderHeaderBig_SalesOrderID PRIMARY KEY NONCLUSTERED
    ( SalesOrderID ASC))

    In this particular instance it appears that OrderDate is defined before Status.  IMHO and as much as I respect Ben Ozark, I still think the optimizer picked the best column order for this particular query.  This is not to say the optimizer always picks the best order, but it did in this case.  However, my point is (as you will see in Part 2) that each index is just part of the bigger picture and that query combinations and index definitions may require some testing depending on the data and usage.  Cheers!

    Mike Byrd

  • Mike Byrd

    Ten Centuries

    Points: 1202

    thomas_franz - Thursday, February 28, 2019 5:45 AM

    I miss one important version:
    If you regularly filter for Status = 5 (and not 3 or 4 or 6 or anything else) and OrderDate BETWEEN x and y, the best solution would be a filtered index:

    CREATE NONCLUSTERED INDEX IX_SalesOrderHeaderBig_OrderDate_Status5
         ON Sales.SalesOrderHeaderBig (OrderDate)
    INCLUDE (SalesPersonId, SubTotal, Status) -- Status has to been included, even if it is the filter column (otherwise you will have key lookups)
      WHERE Status = 5
       WITH (DATA_COMPRESSION = PAGE) -- Since there are usually many Orders at the same day, PAGE compression would be very effective

    This way  you would  decrease the index size (depending on the distribution of the stati) and have much better statistics (since they are created always on the first column in the index)

    I am always careful with filtered indexes; sometimes (especially in the newer editions) the optimizer parameterizes the WHERE clause and the filtered index is not used.  You are correct if Status always is 5.  I'm also very careful with DATA_COMPRESSION = PAGE.  If the newer data is undergoing many updates then there may be a performance hit.  I  always use DATA_COMPRESSION = ROW since there is almost no performance hit and logical reads are usually greatly reduced.  PAGE does offer possible better performance but not if there are many updates to the data.  Usually I look at PAGE Compression when I have a table partitioned and make the older data partitions PAGE and the newer ones ROW.

    Mike Byrd

  • Bryan Rebok

    Grasshopper

    Points: 23

    Mike, when producing its missing index recommendations, SQL Server groups the columns into 2 buckets: all columns involved in equality predicates come first and then all columns involved in inequality predicates (a date range in this case) come second. Within each of these 2 buckets, the column order is based on the ordinal position of the columns within the actual table definition. In other words, the order of columns in the missing index recommendations has nothing to do with perceived selectivity or cardinality.

    In your scenario, the Status column came first, because it was involved in an equality predicate. The OrderDate column came in second, because it was involved in an inequality predicate.

    More details of the algorithm used and repro scripts can be found here: https://dba.stackexchange.com/questions/208947/how-does-sql-server-determine-key-column-order-in-missing-index-requests

  • Bryan Rebok

    Grasshopper

    Points: 23

    Also, be aware that the query uses local variables for the date range. This behavior is completely different than if you used a stored procedure with 2 date parameters or sp_executesql with 2 date parameters. Using local variables causes SQL Server to use the density vector in the statistics to produce the cardinality estimates during query optimization, because the values of local variables are not known until runtime. Using a stored procedure or sp_executesql causes SQL Server to sniff the parameter values during initial query optimization and use those parameter values to produce cardinality estimates from the statistics.

    See https://www.brentozar.com/archive/2018/03/troubleshooting-parameter-sniffing-issues-the-right-way-part-3/ for more information.

  • Matt Wright

    SSC Enthusiast

    Points: 162

    Is that Bryan Nike? My old cube neighbor?



    I am Melvis.

  • Bryan Rebok

    Grasshopper

    Points: 23

    Yep.  It's me, Matt.  I chuckled to myself when I saw that you had responded in this forum a few days ago.  It's a small world after all...

  • Matt Wright

    SSC Enthusiast

    Points: 162

    Small world indeed...



    I am Melvis.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply