Index Creation Guidelines

Often times index placement is a confusing task for many SQL Server developers. We all have been told that indexes do boost performance. The problem is how to select the ideal type of index, the number of columns to index, and which columns should be indexed, and how to answer all these questions without using query analysis tools.

Not many people recognize that the fact that indexes can cause a certain degree of performance degradation. Yes, indexes indeed can contribute to decreased database responsiveness if inappropriately selected. There are a lot of articles that tell developers how to analyze queries, how to use query execution plans, the SQL Server profiler etc. at the time when the query is being defined. But what do you need to do as developer if it is not clear what kind of queries the application(s) will be using or is already using? What if there are a number of various applications accessing the database?

Is there any reasonable rule of thumb, some general index creation guidelines that will allow to come up with indexes disposition simply by looking at the database layout? And what about those numerous cases when a developer is not designing a database but adding a few new tables and/or a few new columns to existing tables? Let’s try to look at some examples and establish basic rules where indexes are useful and where they are useless. I will try to summarize some ideas from various sources and add my own experience from the development, architecture, and DBA prospective. And lets remember that these rules are approximate and may not work properly in all cases but, based on my experience and my students experience, they point to the right direction and work in 80-90% of all cases. (Besides working as an Architect and DBA, I am teaching courses in ‘Database design and development’ and ‘Application design and development’). Remember as well that these guidelines are geared toward developers when they have limited knowledge about potential queries and applications.

There are 2 types of indexes: clustered and non-clustered. Without going in-depth discussion about differences between indexes, lets just note that a clustered index is the actual data. Rows are stored in index order. This means you can have only one clustered index per table, because rows can have only one physical order. Non-clustered indexes are pointer to data: the physical order of rows bears no relationship to a non-clustered index.

The Main Rule: every table should have a clustered index! It will make table maintenance easier and keep data stored in index order. Plus a clustered index is the faster than a non-clustered. The primary key for a table should not always be a clustered index. If you create the primary key and don't specify otherwise, then SQL Server automatically makes the primary key a clustered index.

In most cases a unique column or a set of columns are good candidates for the clustered index. Do not try to place a clustered index on a column that can be updated or in the table were the INSERT is a frequently issued command and clustered index column is randomly defined. It will force the engine to physically rearrange pages allocated for the table, which will slow the system down. In a lot of cases identity column can be subject for clustered index given the lack of a better candidate.

For example:

create table test (callerid char(5), name varchar(50)
    , address varchar(50), recordnum int identity(1,1)
)

Column recordnum is a good choice for the clustered index. If the business rule for the callerid values is the sequence of characters following a pattern: ‘AAAA[A-Z]’, then callerid can be a candidate for the clustered index. If callerid can be any combination of characters and the table can grow rapidly (for telemarketing company, for example, 10,000 to 50,000 callers a day is a normal growth rate) then the field should not be clustered index.

Where indexes should be used ?

Primary
keys. As mentioned above, if you create the primary key and don't specify
otherwise, then SQL Server automatically makes the primary key a clustered
index. It is true in most cases. What are situations when it may not be
true?

For example, the database has the following tables:

Create table notes ( noteid int not null PRIMARY KEY
, userid varchar(50), notedt datetime, siteno char(5) 
)

Create table note_text (noteid int not null
, seqno int not null, notetext varchar(8000)
)

Primary key is the combination of noteid and seqno. What is the most likely way for the note’s text to be retrieved or the table note_text to be joined with the table notes? By noteid column, right?

select notetext FROM note_text
 WHERE noteid = 10
 ORDER BY noteid, seqno

So, indexing of both columns is not necessary and (even it will be created as a part of the primary key constraint) a developer can redefine the index only to use noteid column in table note_text.

Foreign keys defined by a foreign key constraint or logical foreign keys that may not be defined by constraint but will be used in table join.

For example:

Create table address (address_id int Primary Key, address_1 varchar(50), state char(2), zip char(5))

Create table orders (ordered int primary key, main_address_id int, location int)

main_address_id may be setup with foreign key constraint to address table and location is not. But location, according to business rules, logically refers to an address. Then location will be used in the table join to get an address and is the subject for an index creation.

Index
columns often used for a search query, grouping or sort purposes.

How one does identifies such columns if there are no queries or applications yet? Lets examine some examples of such prediction. Check out the previous example for note tables. It is easy to predict that people may often want to look up notes by a specific site number and not so often by userid. By knowing the business of your company, most search fields can be predicted easily. If a table keeps some personal information, the prediction process can tell us that social security number or phone number will be a searchable field in a lot of situations (phone number field(s) design can vary and it will impact your index decision). In a lot of cases it is very easy make a reasonable prediction about 85-90% of searchable criteria.

Now lets determine cases where indexes most likely should not be created. And this is the most interesting part.

Where indexes should not be used

If a column in a table is not at least 80 - 85% unique values, then most likely the SQL Server Query Optimizer will not use a non-clustered index based on that column.

For example: index on column gender with values ‘M’, ‘F’, and NULL is not a subject for the index creation. The same rule can be applied to the flag and status columns with 5-6 different values.

Do not create index on free form text columns.

Columns such comment, name, address, legal name, and so on are considered free form columns. In general, almost any column with data type varchar to text can be considered as a free form text column. I have seen developers making flag columns with varchar(1) data type or social security number as varchar(50) but we are not talking about poor database design.

Small tables (30 – 300 rows or about 10K of data)

In most cases such tables require 1 index for the primary key (most likely clustered) and most lookup tables like state, status_code_description are falling into this category.

Do not create more indexes then is required.

Some developers think that the more indexes are created the better performance. It doesn't work that way. An index can speed up data access, but it can also degrade performance for INSERT, UPDATE, or DELETE because SQL Server must maintain indexes every time. And if you determine that five or more indexes are required then it may be time to stop and revisit your indexes design. Ideally, especially initially, you want to define as few indexes as possible. It is often a balancing act to select the ideal number of indexes for a table in order to find optimal performance. And don't automatically add indexes to a table because it seems like the right thing to do. Try to make an educated guess but remember that it is quite easy to make a guess, create indexes, and then later find out that guesses were wrong.

Composite index is only useful for a query when the WHERE clause of the query matches the column(s) that are leftmost in the index. If the index has been created for the columns (address_id, address_type) but the query is using only address_type the index will not be used by Query Optimizer. Basically saying, it is very difficult to guess properly for the composite index without seeing the actual queries.

Do not create duplicate indexes.

For example: index created for the columns noteid , seqno (see table note_text above). And then another index on the column noteid is created. Second index is the duplicate for the first one because Query Optimizer will use noteid from the first index when only one is created. So, the only one more appropriate index should be created. And, as it was discussed above, it may be the index on single column noteid.

In most cases an index for datetime column is not required (and may be a bad idea as well). So do not create it until you know and test your query. If the search will use a datetime function in ‘where’ clause or some conversion mechanism, most likely the index will not be used by the Query Optimizer but performance will suffer.

Basically saying, the part 'Where indexes should not be used ' may help you not only with index design, but with existing index analysis to get rid of some unnecessary indexes. When I was doing the index analysis for the one of my clients' database had about 15G of space occupied by 247 indexes. After preliminary analysis, without even looking at the queries used by applications in database, I was able to drop more than 70 indexes, increase performance, and release 6G of wasted hard drive space.

Conclusion

Indexing can be quite a challenging task if you are not familiar with your databases, the relationships between tables etc.. You must know the type of queries that will be run against your data, and then these need to be analyzed to determine the most appropriate indexes. In some cases a DBA should determine if fillfactor for an index should be setup differently from the default value, and some other factors could be examined when necessary. Only after all those steps the indexes must be created and tested to see if they any help.

This article, while not pretending to be 100% accurate for the final index definition in all situations, provides basics for the index definition while designing the database tables without resorting to some deep query analysis tools.

Conclusion

Rate

Share

Categories

Share

Rate

Index Creation Guidelines

Conclusion

Rate

Share

Categories

Share

Rate

Related content

Index on multiple columns for performance

Cluster That Index!

Analyzing and Optimizing T-SQL Query Performance on SQL Server

Review of SQL Server 2000 Performance Tuning

Getting The Best From The Index Tuning Wizard

Cookies on SQLServerCentral