Index Column Order – Be Happy!

Question

Index Column Order – Be Happy!

Mike Byrd

Default port

Points: 1458
More actions
February 27, 2019 at 10:07 pm

#379534

Comments posted to this topic are about the item Index Column Order – Be Happy!
Mike Byrd

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Thomas Franz Hall of Fame Points: 3990 More actions · Answer 1

I miss one important version:
If you regularly filter for Status = 5 (and not 3 or 4 or 6 or anything else) and OrderDate BETWEEN x and y, the best solution would be a filtered index:
CREATE NONCLUSTERED INDEX IX_SalesOrderHeaderBig_OrderDate_Status5 ON Sales.SalesOrderHeaderBig (OrderDate) INCLUDE (SalesPersonId, SubTotal, Status) -- Status has to been included, even if it is the filter column (otherwise you will have key lookups) WHERE Status = 5 WITH (DATA_COMPRESSION = PAGE) -- Since there are usually many Orders at the same day, PAGE compression would be very effective

This way you would decrease the index size (depending on the distribution of the stati) and have much better statistics (since they are created always on the first column in the index)

God is real, unless declared integer.

Matt Wright SSC Enthusiast Points: 163 More actions · Answer 2

What is the table definition? I was at a Brent Ozar session at PASS 2018 and they said the column order in index recommendations is based on the order they exist in the table and nothing else. So in your case, I'd have to assume that the index recommendation was just luckily in the right order.

I'd be interested to see other examples where the index recommendation is actually correct on purpose as a learning exercise.

I am Melvis.

Mike Byrd Default port Points: 1458 More actions · Answer 3

Table definition is
CREATE TABLE Sales.SalesOrderHeaderBig(
SalesOrderID int IDENTITY(1,1) NOT NULL,
RevisionNumber tinyint NOT NULL,
OrderDate datetime NOT NULL,
DueDate datetime NOT NULL,
ShipDate datetime NULL,
[Status] tinyint NOT NULL,
OnlineOrderFlag dbo.Flag NOT NULL,
SalesOrderNumber AS (isnull(N'SO'+CONVERT(nvarchar(23),SalesOrderID,(0)),N'*** ERROR ***')),
PurchaseOrderNumber dbo.OrderNumber NULL,
AccountNumber dbo.AccountNumber NULL,
CustomerID int NOT NULL,
SalesPersonID int NULL,
TerritoryID int NULL,
BillToAddressID int NOT NULL,
ShipToAddressID int NOT NULL,
ShipMethodID int NOT NULL,
CreditCardID int NULL,
CreditCardApprovalCode varchar(15) NULL,
CurrencyRateID int NULL,
SubTotal money NOT NULL,
TaxAmt money NOT NULL,
Freight money NOT NULL,
TotalDue AS (isnull((SubTotal+TaxAmt)+Freight,(0))),
Comment nvarchar(128) NULL,
rowguid uniqueidentifier ROWGUIDCOL NOT NULL,
ModifiedDate datetime NOT NULL,
CONSTRAINT PK_SalesOrderHeaderBig_SalesOrderID PRIMARY KEY NONCLUSTERED
( SalesOrderID ASC))

In this particular instance it appears that OrderDate is defined before Status. IMHO and as much as I respect Ben Ozark, I still think the optimizer picked the best column order for this particular query. This is not to say the optimizer always picks the best order, but it did in this case. However, my point is (as you will see in Part 2) that each index is just part of the bigger picture and that query combinations and index definitions may require some testing depending on the data and usage. Cheers!

Mike Byrd

Mike Byrd Default port Points: 1458 More actions · Answer 4

thomas_franz - Thursday, February 28, 2019 5:45 AM
I miss one important version:
If you regularly filter for Status = 5 (and not 3 or 4 or 6 or anything else) and OrderDate BETWEEN x and y, the best solution would be a filtered index:
CREATE NONCLUSTERED INDEX IX_SalesOrderHeaderBig_OrderDate_Status5 ON Sales.SalesOrderHeaderBig (OrderDate) INCLUDE (SalesPersonId, SubTotal, Status) -- Status has to been included, even if it is the filter column (otherwise you will have key lookups) WHERE Status = 5 WITH (DATA_COMPRESSION = PAGE) -- Since there are usually many Orders at the same day, PAGE compression would be very effective
This way you would decrease the index size (depending on the distribution of the stati) and have much better statistics (since they are created always on the first column in the index)

I am always careful with filtered indexes; sometimes (especially in the newer editions) the optimizer parameterizes the WHERE clause and the filtered index is not used. You are correct if Status always is 5. I'm also very careful with DATA_COMPRESSION = PAGE. If the newer data is undergoing many updates then there may be a performance hit. I always use DATA_COMPRESSION = ROW since there is almost no performance hit and logical reads are usually greatly reduced. PAGE does offer possible better performance but not if there are many updates to the data. Usually I look at PAGE Compression when I have a table partitioned and make the older data partitions PAGE and the newer ones ROW.

Mike Byrd

Bryan Rebok Grasshopper Points: 23 More actions · Answer 5

Mike, when producing its missing index recommendations, SQL Server groups the columns into 2 buckets: all columns involved in equality predicates come first and then all columns involved in inequality predicates (a date range in this case) come second. Within each of these 2 buckets, the column order is based on the ordinal position of the columns within the actual table definition. In other words, the order of columns in the missing index recommendations has nothing to do with perceived selectivity or cardinality.

In your scenario, the Status column came first, because it was involved in an equality predicate. The OrderDate column came in second, because it was involved in an inequality predicate.

More details of the algorithm used and repro scripts can be found here: https://dba.stackexchange.com/questions/208947/how-does-sql-server-determine-key-column-order-in-missing-index-requests

Bryan Rebok Grasshopper Points: 23 More actions · Answer 6

Also, be aware that the query uses local variables for the date range. This behavior is completely different than if you used a stored procedure with 2 date parameters or sp_executesql with 2 date parameters. Using local variables causes SQL Server to use the density vector in the statistics to produce the cardinality estimates during query optimization, because the values of local variables are not known until runtime. Using a stored procedure or sp_executesql causes SQL Server to sniff the parameter values during initial query optimization and use those parameter values to produce cardinality estimates from the statistics.

See https://www.brentozar.com/archive/2018/03/troubleshooting-parameter-sniffing-issues-the-right-way-part-3/ for more information.

Matt Wright SSC Enthusiast Points: 163 More actions · Answer 7

Is that Bryan Nike? My old cube neighbor?

I am Melvis.

Bryan Rebok Grasshopper Points: 23 More actions · Answer 8

Yep. It's me, Matt. I chuckled to myself when I saw that you had responded in this forum a few days ago. It's a small world after all...

Matt Wright SSC Enthusiast Points: 163 More actions · Answer 9

Matt Wright

SSC Enthusiast

Points: 163

March 4, 2019 at 8:30 am

#2023439

Small world indeed...

I am Melvis.

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 10

The link for Jonathan's script, once you find it on SQLSkills, is broken. Do you have another source for the script that makes the two sales tables larger?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110209 More actions · Answer 11

A covering index with Status column first followed by OrderDate is inevitably a subject of severe page splits on every ongoing INSERT's. Very soon during normal operation the index will become badly fragmented, causing performance degradation and requiring "beloved" index maintenance.

When an index with OrderStatus at first place will be populated gracefully, keeping the right order of the rows in the index naturally, with no extra effort required.

minor gains in logical reads on a static data would be overcompensated by extra reads from fragmented index, ineffective use of memory and huge overhead of re-writing the index during maintenance excersises.

So, I would not say the suggestion from Microsoft is the best one in this case. As well, as pretty much any other case.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 12

Jeff Moden wrote:

The link for Jonathan's script, once you find it on SQLSkills, is broken. Do you have another source for the script that makes the two sales tables larger?

And, I just got word from Jonathan that the link has been repaired and I tested that as correct. It seems they're moving from Windows to Linux and things like spaces in links and case sensitivity are breaking this kind of thing.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 13

Sergiy wrote:

A covering index with Status column first followed by OrderDate is inevitably a subject of severe page splits on every ongoing INSERT's.

I can definitely vouch for that... in my earlier days, I made such an index on a table that brought Expedia.com to it's knees for about 5 minutes by the very reason you cite above.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101244 More actions · Answer 14

where the Primary Key/Clustered Index is based on SalesOrderID (identity column)

And that's the issue.

If you will often query by OrderDate BETWEEN ..., then cluster the table on:

( OrderDate, SalesOrderID )

in that order. The PK can stay the same, it will just be NONCLUSTERED.

Page compression is often extremely valuable on large tables. If page compression shows as useful -- based on EXEC sys.sp_estimate_data_compression_savings and your own results afterward -- I'd use it, esp. with so few varchar and so many fixed columns. You'd have to do an extraordinarily large number of UPDATEs to offset it.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".