This blog was originally posted on December 17, 2009. I’ve pulled it out of the closet for T-SQL Tuesday #10, with apologies for being too gosh-darn busy to write something new.
This is a companion blog to “Get Index Included Column Info“.
What are indexes for?
We need to talk indexes first, before we get to INCLUDE. When you index, you’re creating a reference that your query can look at to find the WHERE and JOIN terms quickly. So for example, for this query:
SELECT col1 , col2
WHERE col3 = 6
You’d ideally want an index like this: CREATE INDEX IX_BORK_111 ON dbo.BORK (col3) This index will help the query find the rows that it needs, but a nonclustered index like this is a separate structure from the table itself. Each time the SQL engine finds a “6″ in the index, it then has to hop over to the row in the table itself to get the values for col1 and co2…that’s called a “bookmark lookup”. Bookmark lookups are bad because they involve extra overhead – you have to read from the index AND the table, instead of just the index.
To fix this, you can create a covering index – an index that covers all the columns you’ll need for your query: CREATE INDEX IX_BORK_222 ON dbo.BORK (col3, col1, col2) Notice that I kept col3 as the lefthand index column, because it’s our search column. Covering indexes are wonderful, but they can start to take up a lot of room. Imagine if the select statement had been SELECT col1, col2, col3, col4 …. col10. Covering a query like that will make a rather wide index. INCLUDE can mitigate some of those space concerns.
Index structure made easy (I hope)
To understand INCLUDE, you first must understand (a little bit) the structure of an index. An index is organized in a b-tree hierarchy – each node of data (in the case of our index IX_BORK_111, col3) has two nodes beneath it – the left node higher in the sort range, and the right node lower, like this:
3 and 7 are the child nodes of 5. 1 and 4 are the child nodes of 3. The lowest level nodes are called leaf nodes, so 1,4,6,and 9 are the leaf nodes. The SQL engine walks through the tree to find the values it needs…
If we create our covering index like this: CREATE INDEX IX_BORK_222 ON dbo.BORK (col3, col1, col2) then each node in the tree will contain values for col3, col1, and col2. That can take up a lot of space. But in our query, we don’t need to search based on col1 and col2….we only search based on col3. The other two columns are just there to be returned in the SELECT statement.
INCLUDE lets us make the index smaller, while still supplying our SELECTed columns. First, here’s the index declaration: CREATE INDEX IX_BORK_333 ON dbo.BORK (col3) INCLUDE (col1, col2)
Each node in the tree will contain the value for col3. But only the leaf level nodes will hold the values for col1 and col2. That’s how INCLUDE makes the index smaller – the rest of the tree doesn’t contain extra, unused information. The SQL engine performs its search based on col3, finds the leaf level node(s) that match, and there at the leaf level is the rest of the information we need.
Key Ideas Review
Key ideas mentioned:
Indexes let queries find data faster.
Nonclustered indexes are separate objects from tables.
A covering index is ideal – it gives the query everything it needs, without having to touch the table itself (that’s called a bookmark lookup).
Nonclustered indexes take up space, so you don’t necessarily want a lot of really wide covering indexes. Use with discretion.
One final point: Remember that indexes must be updated every time data is updated or inserted into the table. The more indexes, the longer your inserts and updates will take. Indexing is a balance between supplying enough to support your read operations, and keeping insert/update overhead low.
As always, please email or comment if you have questions or corrections. Happy days!