Earlier today, I presented a session for Pragmatic Works Training on the T’s titled The Flavors of Non-Clustered Indexes. In the session, I discussed non-clustered indexes and some of the options available for building them. It’s a topic I talk about often but is always an important one.
The abstract for the sessions is:
A non-clustered index is just another way to get to data. Add in a couple columns and off you go. Not too much to think about, just fire and forget. But it’s not really that easy. Non-clustered indexes have a number of various ways in which they can be built and designed that can have significant impacts on performance. In this session we’ll look at what can go into a non-clustered index. We will explore what you can put into a non-clustered index and what you can expect back out from performance.
Webcast Recording & Materials
If you missed the webcast, you’ll can watch the recording here. For those that want to download the slide deck and the scripts, you can click the following link – The-Flavors-Non-Clustered-Indexes.zip.
Also, if you just want to flip through the slide deck again, here you go.
For a one time conversion, with large amount of data, might it make sense to include a lot of columns in a covering index?
Yes, it can absolutely make sense. If you can create an index that improves the performance of the query where it is faster to build the index and run the query than it is to run without the index, then do that. This is also a situation where it is good to consider building filtered indexes that align to the filters in the query.
Does every non-clustered index (unique and non-unique) includes clustered index key?
Yes, the clustered index keys are always present. They are there to provide a pointer back to the clustered index so that the non-clustered index can access additional data for the row, if necessary. For heaps, substitute in the row ID instead of the clustered index key.
Why not a compound index of first name and last name?
This was asked while explaining index intersection. Multiple column indexes are also great options when both columns will be used for filtering.
How do you evaluate impact on upsert of added indices?
You will want to look at the dynamic management views sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats. These provide the rates of use, updates, and blocking. That can help identify upsert contention.
I see [multiple] index[es], how would I know if [query] optimizer choose [one index over another]?
Check the dynamic management view sys.dm_db_index_usage_stats for a list of indexes that are being used by execution plans.
Index sorted from left to right, what [does] that means? Can you please elaborate?
The leading edge of indexes, which is the left-most column, is the value first sorted in an index. Any other columns in the index are sorted as a child to the left column. So, in effect, if you sort last names and then first names in an index. All of the last names will be sorted together. With first names, though, they will be sorted in sub-groups based on last name, then first name. Let me know if this needs more clarity and I’ll expand on a full blog post.
Logical reads happens after initially physical reads have taken place?
Sort of, think of these as separate operations. Logical reads is the reads that occur for the query from memory. Physical reads are the reads that occur placing the data from storage into memory.
If the query has multiple columns. to create index has include columns. What it is the rule on which columns should be selected as included columns?
Let me answer a bit in reverse. For the key columns of the index, choose the columns that provide the filters for the data. These should be the most selective columns in the query. They should be from the WHERE, JOIN, and GROUP BY clauses. Columns with low selectivity and are part of the SELECT clause only should be added as INCLUDED.
Compare multi-column index with index which has Included columns. which way is perform better?
This answer is similar to the answer on the previous question. The only additional comment I would make is that by including multiple low selectivity columns as included columns in an index, you can make indexes multi-purpose. What I mean here is that an index on A,B and A,C and A,D can be rewritten as A with B,C,D included. The high selectivity on A gets you mostly of the performance gain and SQL Server can just check for the filters on B,C,D based on need.
In a combined key, does first column need always be present in the where clause?
The first column doesn’t need to be present to use the index, but it has to be part of the predicates on the query in order to leverage the index sort.
I am looking for a good book to teach me how to write better SQL. In fact, I would like to find a book that has all SQL commands (for SQL Server) and I could study it to determine how to write better SQL. Any suggestions?
For multi column indexes, what do you mean you cannot skip columns to leverage the sort?
Since the sort is ordered from the left to the right on columns in the index, the second, third, etc. columns are not independently sorted. In order to leverage the sort for the second column, the sort from the first column must also be included.
What is the difference between multi-columns index and included column index?
In multi-column index, all of the columns are key columns and part of the overall sort of the index. Also, the columns are part of the index at all levels of the b-tree. For included columns, these are not sorted and the columns are only available within the index at the data pages.
Thanks for Attending
Thanks to those that attended the webcast. If you have any questions on the webcast or suggestions for improvements, please leave a comment on this post.