Blog Post

SQL Server : Part 8 : Explaining The Covering Index or Included Columns

,

In our earlier discussion about non clustered index ,we have seen that, the leaf level of a non clustered index contain only the non clustered index key column and clustered index key (if the table is a clustered table). To fetch the remaining column from the clustered index structure or heap structure, SQL server has to do a bookmark/key look up operation.Many time the bookmark or key look up operation might be costly affair. Let us see an example.

USE mydb
GO
DROP TABLE dbo.SalesOrderDetail                               
GO                               
SELECT * INTO dbo.SalesOrderDetail FROM AdventureWorks2008.Sales.SalesOrderDetail
GO
CREATE UNIQUE CLUSTERED INDEX ix_SalesOrderDetail ON dbo.SalesOrderDetail(SalesOrderDetailID)
GO
CREATE UNIQUE NONCLUSTERED INDEX ix_Productid ON dbo.SalesOrderDetail(ProductId,SalesOrderId)
GO 
SET STATISTICS IO ON
GO
SELECT SalesOrderDetailid,productid,salesorderid,orderqty,unitprice FROM SalesOrderDetail WHERE productid=707 AND 
SalesOrderID=43680

The execution plan and the out put of IO statistics of the select statement are given below.














In the execution plan, you can see that ,50 percent of the query cost is contributed by the Key Lookup operation.In the output of the IO statistics , it clearly says SQL server performed 5 IO operation to fetch the single record. 

Note that, the existing non clustered index have 229 pages and depth is 2( levels in the b tree structure).Seek operation on this index need to perform only 2 IO operation to complete the task.You can verify this using the  DBCC IND  command or refer the earlier post.


Let us assume that, this query(with different parameters ) is used very frequently from the application and you need to optimize it further.How we can do that ? The only way that we can optimize this query is by avoiding the Key lookup operation. For that we can modify our non clustered index and add the remaining two column (OrderQty and UnitPrice) which are not part of clustered index key or non clustered index key. 

DROP INDEX ix_Productid ON dbo.SalesOrderDetail

GO
CREATE UNIQUE NONCLUSTERED INDEX ix_Productid ON dbo.SalesOrderDetail(ProductId,SalesOrderId,OrderQty ,UnitPrice)
GO
SELECT SalesOrderDetailid,productid,salesorderid,orderqty,unitprice FROM SalesOrderDetail WHERE productid=707 AND SalesOrderID=43680













Now we were  able to get rid of the Key lookup operation from the execution plan and to reduce the IO from 5 to 3.But if we  look into the out put of DBCC IND of the modified non clustered index, we can see that , depth of the b tree is increased by one due to this change. As the index level is increased , the non clustered index has to to perform 3 IO to complete the operation. This will be worst, if we have more column in the select list and we added all those columns into the non clustered index key to avoid the key lookup operation.

Here comes the covering index to help us.Covering index help us to add non key column to leaf level of the non clustered index with very minimal possibility of increasing the depth of the b-tree structure. This can be achieved by adding include column in the CREATE INDEX statement.
An index that contains all information required to resolve the query is known as a Covering Index.When we create a nonclustered index to cover a query, we can include nonkey columns in the index definition to cover the columns in the query that are not used as primary search columns. Performance gains are achieved because the query optimizer can locate all the required column data within the index; the table or clustered index is not accessed.

DROP INDEX ix_Productid ON dbo.SalesOrderDetail
GO
CREATE UNIQUE NONCLUSTERED INDEX ix_Productid ON dbo.SalesOrderDetail(ProductId,SalesOrderId
include(OrderQty ,UnitPrice)
GO
SELECT SalesOrderDetailid,productid,salesorderid,orderqty,unitprice FROM SalesOrderDetail 
WHERE productid=707 AND SalesOrderID=43680













With this also, we were able to get rid of the key lookup operation and to reduce the IO operation to 2. The IO operation clearly says the the depth of the clustered index is two.
Let us see the output of the DBCC IND

SELECT index_id FROM sys.indexes WHERE name='ix_Productid' AND OBJECT_ID= OBJECT_ID('SalesOrderDetail')
GO
DBCC ind('mydb','SalesOrderDetail',2)


This returns 378 records and the root page is 7456 (Value of pagepid column of the record having max value for indexlevel column)

Let us see the root page and one leaf level page

DBCC traceon(3604)

GODBCC page ('mydb',1,7456,3)GODBCC page ('mydb',1,7328,3)


























From the output we can see that, columns mentioned in the include clause are added into the leaf level pages with out making any changes in the non leaf level pages.

Include column are useful because we can refer the column that has a data type which can not be used in the index key.More over include columns are not counted in the 900 bytes or 16 key column limitation of index keys.We can include with any data types except text,ntext and image.Included column also support the computed column.

If you liked this post, do like my page on FaceBook 

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating