Earlier this month, I presented a session for Pragmatic Works Training on the T’s on the selecting the clustered index for a table in a SQL Server database. In the session, titled How Do Non-Clustered Indexes Improve Performance?, in this session I discussed the patterns around non-clustered indexes and why you might choose which patterns for different needs.
The abstract for the sessions is:
We’ve all heard the mantra about the importance of clustered indexes, but what about non-clustered indexes? Do you really need non-clustered indexes or are they just something that consultants recommend to increase their billable hours? In this session, we’ll investigate the various flavors of non-clustered indexes and how they can be used to provide significant performance improvements.
If you missed the webcast, you can access the recording from the Pragmatic Works Learning Center. A recording of the session is available here. The presentation deck and demo code can be download here (How Do Non-Clustered Indexes Improve Performance?), or you can flip through the deck below. If you are interested in having this presentation, or others, delivered to your user group or company, feel free to reach out to me in the comments or through twitter.
Session Questions and Answers
The questions below were asked during the session. I’ll keep the answers fairly brief, so if I miss a point and need to explain more, let me know.
Q: How can we quick know which is better either to create a clustered index or a non-clustered index in the table?
You will want both a clustered index and non-clustered indexes on your tables. The clustered should be used for the primary method in which users will access rows. The non-clustered indexes will provide the alternate paths that are needed to select rows from the tables. Neither is better, they are both needed.
Q: Can we create primary key columns with non clustered index? What will be performance improvement over clustered from non-clustered?
Non-clustered indexes can be created on same columns that are used for the clustered indexes and primary keys. This can be beneficial in some cases, especially on wide tables. For example, if only 2-3 rows fit on any page on the clustered index, the table will be quite large as the number of rows increases. Queries that leverage just the clustered index key columns will need to traverse the whole table to look scan for all values. Alternatively, a non-clustered index on the same key columns would require far fewer pages and the would require less effort to scan all key columns.
Q: Can you review again why the column order matters on multi-column indexes?
The column order on non-clustered indexes matters because the left most column is the one that is first sorted in the index. If the an index is first name and last name, the first name column is the primary sort of the index. All of the last names are then sorted after their associated first name. Due to this, a search on first name can go directly to all of the first names that it needs. For the last name, there is no information stored on the sorting or occurrences of the last name. In order to find any of the last names, all rows in the index need to be checked.
Q: Should we use Estimate or Actual Execution plan to see [performance]?
Either can be used. It really depends on the information that you are trying to get about the execution of the query. The estimated plan is the plan that SQL Server will used to execute the query. The actual plan will be the same plan as the estimated plan, except it contains information on how the plan performed while it executed and may provide insight into issues with the chosen plan.
Q: How many indexes can be added together for index intersection? Could 5 indexes be joined?
In my experience, I’ve only seen two indexes leverage index intersection. It’s not necessarily a goal of indexing to aim for intersection, but it can help minimize indexes on heavily used tables.
Q: If you had a clustered PK on a table, would you create a non-clustered index on the PK column but include common used fields in an include index?
Typically, I would avoid doing this. Unless the table is very wide, or has many columns. In those cases, in lew of redesigning the tables and database, a non-clustered index can help improve performance. But, in most cases, this is not something that should be done.
Q: In the query does not SQL Server read the predicate and decide what is the best ie. for the Catherine then Cox would it not rewrite it to Cox then Catherine and seek Cox first then Catherine?
For the first execution of the query, the statistics associated with the values will be used to determine which execution plan will be ideal for the query. On the second and subsequent executions, te
Q: Is a single non-clustered index on three columns same as creating three non-clustered indexes on three columns separately?
No, an index on the non-leading columns of an index allow ordered searches on the second and third columns; without the use of the first column.
Q: Another question…What are the best indexes for data warehousing for joining the fact tables to dimension tables in SQL Server?
For data warehousing, there are two types of indexes that you’ll want to leverage. First, during the ETL process, there may be some indexes related to lookups and data validation that will prove beneficial. Second, look at your common queries, for these you want to create multi-column indexes that will cover the filters, joins, and columns that are part of the queries.
Q: Is there a trade off of having too many indexes and slowing down saves and updates?
There is definitely a trade off. With too many indexes, locking and blocking can begin to become a problem. Also, updates will be more and more complicated as the values in the non-clustered indexes are updated.
Q: Will this session be recorded and will the scripts from the presentation available somewhere as well?
Yes, the recording and links for downloads are listed above.
Q: Too many indexes will reduce performance right?
Yes, this is due to locking, blocking, and the time to maintain the values in the index.
Q: How to find out how many indexes have to be used to improve performance?
When looking at the indexes on a table, you want to consider many factors. How often the index is used. Where the index is used. What is the update rate for the index. What types of operations occur on the index. These all contribute to the performance and value of the index. One method that can be used to anaylze the indexes on a table is through the sp_indexanalysis stored procedure on this blog.
Q: What are negative impacts of a non-clustered index without a clustered index on the table? Is there any scenario where this kind of non-clustered indexes helpful?
If a table lacks a clustered index, then it is considered to be a heap. When queries on heaps leverage non-clustered indexes, the behavior is quite similar. The chief difference is that a heap does not allow for a seek on the table for discrete values. If the non-clustered index needs a column that isn’t in the non-clustered index, then (similar to a key lookup) the query will leverage a RID Lookup to obtain the information. While it may appear this difference is trivial, the key is that since the table is leveraging a heap, even the most frequently searched columns must always use non-clustered indexes. No ordered seeks for values can occur on the table, only scans.
Q: What is the concept of Uniquifier?
The uniquifier provides an element of uniquenes to clustered indexes that are not set as being unique. In order to all non-clustered indexes to access specific rows in a table, there must be a unique set of values on each row to identify a row. For clustered indexes with non-unique columns, the uniquifier allows the each row to be unique.
Q: What is the downside to using data compression in indexes?
The downside to data compression on indexes is that the algorithm for compression requires additional CPU to complete. On systems that already have high CPU utilization, using compression may require more resources that are available; which can lead to performance degradation.
Q: What will be the impact on performance if I use LIKE statement on Non-Clustered index?
The good news is that there really isn’t anything conceptually wrong with the LIKE statement. In most cases, it will performance in the same manner as an equality comparison (=). The issue to be concerned with is when a wildcard on the LIKE comparison value appears at the start of the search value. For instance, if the value searched for is “ACB%” then the sort of the index can be used and seeks can occur. Conversely, if the search value is “%ABC”, then SQL Server doesn’t know where to find those value from a sorted perspective and needs to check every row to see if the values match.
Q: Would 2 indices — One on LName, and one on FNameLname be optimal? Selective LName, lest sselective FNameLName interseciton, or FNameLName when want that particular person?
Sadly, the answer is it depends. Unless the distribution of all values is even, then there will be cases where it is better to search for first name over last name, or vice versa. The key is to determine which pattern provides the best overall performance and guide SQL Server and your indexing down that path.
Thanks for Attending
Thanks to those that attended the webcast. The session went pretty much up to the end of the hour so the questions that were asked are listed above. If you had a question that didn’t get answered or asked during the webcast, feel free to leave the question in the comments below.