Welcome back to day 2 of Advanced Indexing. Today we’re going to look at a feature that was added in SQL Server 2008 – filtered indexes.
In versions previous, indexes were always on the entire table. An index would always have the same numb of rows as the table it was built on did (which is why COUNT(*) can just scan the smallest index on the table)
With filtered indexes, it’s possible to have an index that’s built on a subset of the rows in the table. The definition for a filtered index contains a WHERE clause predicate that determines if a row in the table will be in the index or not.
This can be a major advantage on really large tables where most queries are only interested in a small fraction of the table. A normal index would be based on the entire table regardless of the fact that most of the table is of no interest, meaning the index would be larger than necessary, deeper than necessary and take up more space than would be ideal. With a filtered index on just the interesting portion of the table, The index size is kept to a minimum, meaning it’s shallower than an index on the entire table and hence more efficient.
A simple example of a filtered index would be
CREATE NONCLUSTERED INDEX idx_Example ON Account (AccountNumber) WHERE Active = 1;
There are two main uses of a filtered index:
1) Enforcing moderately complex uniqueness requirements
2) Supporting queries that filter on common subsets of a table
Filtered indexes and unique columns
One very interesting use of filtered indexes is in enforcing uniqueness over portions of a table. One requirement that come up again and again is to have a nullable column that must have unique entries in it, but whose entries are optional. Basically, the column must be unique or null. Sounds easy, but the problem is that a unique index allows only one null. So much for nulls not being equal to anything including other nulls.
Prior to SQL 2008 implementing such a constraint required computed columns, indexed views or triggers. With SQL 2008′s filtered indexes, it’s trivial.
CREATE UNIQUE NONCLUSTERED INDEX idx_SomeTable_SomeColumn ON SomeTable (SomeColumn) WHERE SomeColumn IS NOT NULL;
It has to be a unique index not a unique constraint as indexes can be filtered, constraints cannot.
This can be extended to various forms of moderately complex unique requirements and is certainly an improvement over using indexed views or complex calculated columns (or just trusting the application to do things right).
The really interesting use of filtered indexes though is for supporting queries. Here filtered indexes are very useful in cases where a queries against a table frequently include a specific filter.
A couple common cases of this are table that flag rows as active or inactive and most queries are interested in only the active rows, or in a database design where deletes are logical (an IsDeleted column) and almost every query filters for rows not marked as deleted.
Let’s have a look at a couple examples here. I’m not using AdventureWorks because the database design doesn’t include these kinds of patterns. The table design is given at the end of the post and a SQLDataGenerator project file is attached.
First let’s look at a simple example. This table stores support calls, and this query is looking for all recent open calls logged to one of the support people.
SELECT CallID, LogDate, AssignedTo FROM dbo.CallLog AS cl WHERE CallStatus = 'Open' AND AssignedTo = 42 AND LogDate > DATEADD(ww,-1,GETDATE());
Now, we could create a normal nonclustered index with CallStatus, AssignedTo and LogDate in the key, but let’s say that while the AssignedTo and LogDate filters change for this query, the filter is always, always, always for CallStatus = ‘Open’. This table has around 200 open calls and 100000 closed calls. Creating an index with the closed calls as well is just wasting space and time, no one’s interested. So, what I can do is this:
CREATE NONCLUSTERED INDEX idx_CallLog_AssignedToLogDate ON dbo.CallLog (AssignedTo, LogDate) WHERE CallStatus = 'Open';
One thing to note here is that if the query filter exactly matches the index filter, the index doesn’t need to have that column as either a key or include column. It’s not being selected and the filter is entirely taken care of with the indexes filter.
It’s worth noting that there’s bug relating to this, specifically around filtered indexes with an IS NULL filter. See http://connect.microsoft.com/SQLServer/feedback/details/454744/filtered-index-not-used-and-key-lookup-with-no-output/ The bug is still unfixed in the latest CTP of SQL Server 2012.
Let’s have a look at a second example, where the query’s filter is a subset of the index’s filter.
Let’s say that a very frequent query is for the urgent or urgent and high priority calls (severity 1 and 2). So these are two common queries:
SELECT CallID, LogDate, CallStatus, Severity FROM dbo.CallLog AS cl WHERE Severity < 3 -- urgent and high AND AssignedTo = 1; SELECT CallID, LogDate, CallStatus, Severity FROM dbo.CallLog AS cl WHERE Severity = 1 -- urgent AND AssignedTo = 1;
In this case the filtered index can be used, but the column that’s being filtered on must also be in the index key, because the second query is filtering for a subset of the rows that the index contains.
An examination of the properties of the index seeks shows that for the first query there’s only one seek predicate – AssignedTo, whereas for the second query there are two seek predicates – AssignedTo and Severity
There are some limitations around filtered indexes and the matching of the filters. There are cases where a query and a filtered index have predicates that are logically equivalent, but where the filtered index can’t be used.
An example of this is not hard to generate. Let’s try a table that has an IsDeleted bit column (defined as not nullable)
CREATE TABLE Users ( UserID INT IDENTITY PRIMARY KEY, UserName VARCHAR(50), DepartmentID INT NOT NULL, IsDeleted BIT NOT NULL DEFAULT 0 ); CREATE NONCLUSTERED INDEX idx_Users_DepartmentID ON dbo.Users (DepartmentID) INCLUDE (UserName) WHERE IsDeleted = 0;
The IsDeleted column is a non-nullable bit column. Hence it can only have two possible values, 0 and 1. Hence, these two queries are completely equivalent in their results
SELECT UserName, DepartmentID FROM dbo.Users WHERE IsDeleted = 0 AND DepartmentID = 3; SELECT UserName, DepartmentID FROM dbo.Users WHERE IsDeleted != 1 AND DepartmentID = 3;
The first one uses the filtered index, the second does not. The second one scans the clustered index because despite the filter essentially being the same as the index filter, it’s not the same and the query hence cannot use the filtered index.
Another limitation has to do with parametrisation. If the query is passed in a parametrised form, or is subject to simple or forced parametrisation, then by the time that the optimiser gets the query there may not be sufficient information to tell that a filtered index is usable or not.
If we imagine the case of the table with the IsDeleted column again, a query that has a filter IsDeleted = 0 is definitely capable of using a filtered index that has the predicate IsDeleted = 0, but if the query gets parametrised and arrives at the optimiser in the form IsDeleted = @p1, there’s no way that query can match the filtered index because the value of @p1 on future executions could be 0, 1, NULL or 42, and in any case other than 0, if the cached plan used the filtered index it would produce incorrect results.
We can see this by setting the database parametrisation to forced and re-running an earlier example.
With parametrisation simple and the index with a filter on severity < 3, these two queries produce different execution plans
SELECT CallID, LogDate, CallStatus, Severity FROM dbo.CallLog AS cl WHERE Severity = 1 AND AssignedTo = 1; SELECT CallID, LogDate, CallStatus, Severity FROM dbo.CallLog AS cl WHERE Severity = 4 AND AssignedTo = 1;
However if the database is set for forced parameterisation, then the query is only seen in its parametrised form, and both queries have the same plan, one that does not use the filtered index
ALTER DATABASE Testing SET PARAMETERIZATION FORCED; GO SELECT CallID, LogDate, CallStatus, Severity FROM dbo.CallLog AS cl WHERE Severity = 1 AND AssignedTo = 1; SELECT CallID, LogDate, CallStatus, Severity FROM dbo.CallLog AS cl WHERE Severity = 4 AND AssignedTo = 1;
Right, so, homework for today. Given the table design below and the following queries, design one filtered index that both queries can use effectively (a filtered index that doesn’t filter out any rows is not an acceptable answer). Assume those are fixed queries that are frequently run with exactly that structure and exactly those values.
CREATE TABLE CallLog ( CallID INT IDENTITY PRIMARY KEY, CallStatus CHAR(6) NOT NULL, LogDate DATETIME NOT NULL, LastUpdateDate DATETIME, Title VARCHAR(500), Severity TINYINT, AssignedTo INT, UserID INT ); SELECT CallID, CallStatus, AssignedTo FROM dbo.CallLog WHERE CallStatus = 'Open' AND LastUpdateDate IS NOT NULL AND Severity = 1; SELECT CallID, LogDate, LastUpdateDate FROM dbo.CallLog WHERE LastUpdateDate < DATEADD(dd,7,GETDATE()) AND Severity IN (1,2);
Answers for Monday’s homework:
1) Yes, this can use an index for both filter and group by. Index key columns would be (TransactionType, ReferenceOrderID, ProductID), in that order, and index include columns would be (Quantity, ActualCost)
2) No, because of the inequality we can use an index to support filtering or aggregating but not both. So there will either be a plan with an index seek and a hash aggregate (or sort and stream aggregate) or a plan with an index scan and a stream aggregate, but there’s no way to get a seek and stream aggregate without a sort