Index Column Order – Be Happy!

  • Comments posted to this topic are about the item Index Column Order – Be Happy!

    Mike Byrd

  • I miss one important version:
    If you regularly filter for Status = 5 (and not 3 or 4 or 6 or anything else) and OrderDate BETWEEN x and y, the best solution would be a filtered index:

    CREATE NONCLUSTERED INDEX IX_SalesOrderHeaderBig_OrderDate_Status5
         ON Sales.SalesOrderHeaderBig (OrderDate)
    INCLUDE (SalesPersonId, SubTotal, Status) -- Status has to been included, even if it is the filter column (otherwise you will have key lookups)
      WHERE Status = 5
       WITH (DATA_COMPRESSION = PAGE) -- Since there are usually many Orders at the same day, PAGE compression would be very effective

    This way  you would  decrease the index size (depending on the distribution of the stati) and have much better statistics (since they are created always on the first column in the index)

    God is real, unless declared integer.

  • What is the table definition? I was at a Brent Ozar session at PASS 2018 and they said the column order in index recommendations is based on the order they exist in the table and nothing else. So in your case, I'd have to assume that the index recommendation was just luckily in the right order.

    I'd be interested to see other examples where the index recommendation is actually correct on purpose as a learning exercise.



    I am Melvis.

  • Table definition is
    CREATE TABLE Sales.SalesOrderHeaderBig(
     SalesOrderID int IDENTITY(1,1) NOT NULL,
     RevisionNumber tinyint NOT NULL,
     OrderDate datetime NOT NULL,
     DueDate datetime NOT NULL,
     ShipDate datetime NULL,
     [Status] tinyint NOT NULL,
     OnlineOrderFlag dbo.Flag NOT NULL,
     SalesOrderNumber  AS (isnull(N'SO'+CONVERT(nvarchar(23),SalesOrderID,(0)),N'*** ERROR ***')),
     PurchaseOrderNumber dbo.OrderNumber NULL,
     AccountNumber dbo.AccountNumber NULL,
     CustomerID int NOT NULL,
     SalesPersonID int NULL,
     TerritoryID int NULL,
     BillToAddressID int NOT NULL,
     ShipToAddressID int NOT NULL,
     ShipMethodID int NOT NULL,
     CreditCardID int NULL,
     CreditCardApprovalCode varchar(15) NULL,
     CurrencyRateID int NULL,
     SubTotal money NOT NULL,
     TaxAmt money NOT NULL,
     Freight money NOT NULL,
     TotalDue  AS (isnull((SubTotal+TaxAmt)+Freight,(0))),
     Comment nvarchar(128) NULL,
     rowguid uniqueidentifier ROWGUIDCOL  NOT NULL,
     ModifiedDate datetime NOT NULL,
     CONSTRAINT PK_SalesOrderHeaderBig_SalesOrderID PRIMARY KEY NONCLUSTERED
    ( SalesOrderID ASC))

    In this particular instance it appears that OrderDate is defined before Status.  IMHO and as much as I respect Ben Ozark, I still think the optimizer picked the best column order for this particular query.  This is not to say the optimizer always picks the best order, but it did in this case.  However, my point is (as you will see in Part 2) that each index is just part of the bigger picture and that query combinations and index definitions may require some testing depending on the data and usage.  Cheers!

    Mike Byrd

  • thomas_franz - Thursday, February 28, 2019 5:45 AM

    I miss one important version:
    If you regularly filter for Status = 5 (and not 3 or 4 or 6 or anything else) and OrderDate BETWEEN x and y, the best solution would be a filtered index:

    CREATE NONCLUSTERED INDEX IX_SalesOrderHeaderBig_OrderDate_Status5
         ON Sales.SalesOrderHeaderBig (OrderDate)
    INCLUDE (SalesPersonId, SubTotal, Status) -- Status has to been included, even if it is the filter column (otherwise you will have key lookups)
      WHERE Status = 5
       WITH (DATA_COMPRESSION = PAGE) -- Since there are usually many Orders at the same day, PAGE compression would be very effective

    This way  you would  decrease the index size (depending on the distribution of the stati) and have much better statistics (since they are created always on the first column in the index)

    I am always careful with filtered indexes; sometimes (especially in the newer editions) the optimizer parameterizes the WHERE clause and the filtered index is not used.  You are correct if Status always is 5.  I'm also very careful with DATA_COMPRESSION = PAGE.  If the newer data is undergoing many updates then there may be a performance hit.  I  always use DATA_COMPRESSION = ROW since there is almost no performance hit and logical reads are usually greatly reduced.  PAGE does offer possible better performance but not if there are many updates to the data.  Usually I look at PAGE Compression when I have a table partitioned and make the older data partitions PAGE and the newer ones ROW.

    Mike Byrd

  • Mike, when producing its missing index recommendations, SQL Server groups the columns into 2 buckets: all columns involved in equality predicates come first and then all columns involved in inequality predicates (a date range in this case) come second. Within each of these 2 buckets, the column order is based on the ordinal position of the columns within the actual table definition. In other words, the order of columns in the missing index recommendations has nothing to do with perceived selectivity or cardinality.

    In your scenario, the Status column came first, because it was involved in an equality predicate. The OrderDate column came in second, because it was involved in an inequality predicate.

    More details of the algorithm used and repro scripts can be found here: https://dba.stackexchange.com/questions/208947/how-does-sql-server-determine-key-column-order-in-missing-index-requests

  • Also, be aware that the query uses local variables for the date range. This behavior is completely different than if you used a stored procedure with 2 date parameters or sp_executesql with 2 date parameters. Using local variables causes SQL Server to use the density vector in the statistics to produce the cardinality estimates during query optimization, because the values of local variables are not known until runtime. Using a stored procedure or sp_executesql causes SQL Server to sniff the parameter values during initial query optimization and use those parameter values to produce cardinality estimates from the statistics.

    See https://www.brentozar.com/archive/2018/03/troubleshooting-parameter-sniffing-issues-the-right-way-part-3/ for more information.

  • Is that Bryan Nike? My old cube neighbor?



    I am Melvis.

  • Yep.  It's me, Matt.  I chuckled to myself when I saw that you had responded in this forum a few days ago.  It's a small world after all...

  • Small world indeed...



    I am Melvis.

  • The link for Jonathan's script, once you find it on SQLSkills, is broken.  Do you have another source for the script that makes the two sales tables larger?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A covering index with Status column first followed by OrderDate is inevitably a subject of severe page splits on every ongoing INSERT's. Very soon during normal operation the index will become badly fragmented, causing performance degradation and requiring "beloved" index maintenance.

    When an index with OrderStatus at first place will be populated gracefully, keeping the right order of the rows in the index naturally, with no extra effort required.

    minor gains in logical reads on a static data would be overcompensated by extra reads from fragmented index, ineffective use of memory and huge overhead of re-writing the index during maintenance excersises.

    So, I would not say the suggestion from Microsoft is the best one in this case. As well, as pretty much any other case.

    _____________
    Code for TallyGenerator

  • Jeff Moden wrote:

    The link for Jonathan's script, once you find it on SQLSkills, is broken.  Do you have another source for the script that makes the two sales tables larger?

    And, I just got word from Jonathan that the link has been repaired and I tested that as correct.  It seems they're moving from Windows to Linux and things like spaces in links and case sensitivity are breaking this kind of thing.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sergiy wrote:

    A covering index with Status column first followed by OrderDate is inevitably a subject of severe page splits on every ongoing INSERT's.

    I can definitely vouch for that... in my earlier days, I made such an index on a table that brought Expedia.com to it's knees for about 5 minutes by the very reason you cite above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • where the Primary Key/Clustered Index is based on SalesOrderID (identity column)

    And that's the issue.

    If you will often query by OrderDate BETWEEN ..., then cluster the table on:

    ( OrderDate, SalesOrderID )

    in that order.  The PK can stay the same, it will just be NONCLUSTERED.

    Page compression is often extremely valuable on large tables.  If page compression shows as useful -- based on EXEC sys.sp_estimate_data_compression_savings and your own results afterward -- I'd use it, esp. with so few varchar and so many fixed columns.  You'd have to do an extraordinarily large number of UPDATEs to offset it.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply