Indexes - Best practices?

  • Good day everyone,

    I'm sure this is very basic question but it's one that I don't know, so I'm going to ask.

    I'm having a difficult time determining which columns should make up the clustered index of some tables in my DB.  If I have a customer table which contains a column for lastName, it makes sense to me to create the clustered index off of (at least) that column as it is pretty likely that queries will be written which call on that column, correct?

    My current struggle has to do with a DB which has been normalized (a fair amount) and you have some join tables which only have a couple of columns.  As an example, assume I have an Address table and and State table.

    ADDRESS (AddressID, AddLine1, AddLine2, AddLine3, StateID, etc)

    STATE (StateID, StateName)

    My confusion comes when trying to determin how (or if I even should) create an index for the STATE table.  I have seen books that say the clustered index will probably be the same as the primary key for the table but in this case, I wouldn't do that (not really knowing yet if this is accurate).  In this table, I would define the StateID as the primary key but I don't think I would create a clustered index on that field.  It is more likely that you would want to query agains a particular state so it would make sense to me to create the clustered index on that column.

    To take this one step further, assume I have a customer table with a CustID column and I have an FK of CustID in the ADDRESS table.  In this situation I may have another join table which joins the CUSTOMER table and the ADDRESS table called CUST_ADDRESS_JOIN and contains the columns CustID and AddressID.

    Given that you probably won't query directly for CustID or for AddressID, how do you know when to create the index on (clustered) for this table?

    I'm not sure if I'm putting too much emphasis on "what you query" and "what to index" so any pointers would be greatly appreciated.  If I'm over thinking this, please let me know.  I do understand there are other types of indexes that probably play in to all this as well (non-clustered / covering)

    Just trying to really understand this stuff.

    Thanks all!

    New(B)ob

  • Your state table would have stateID as the pkey. Since I doubt you would be listing the states from it ordered by stateName very often, and would instead be joining to it using stateID most of the time, use stateID as the clustered index. Presuming you are in USA with 50 states the table's not very big so an index won't make a huge difference but you should always have a clustered index anyway.

    The relationship between clustered and non-clustered indices is that each non-clustered index stores its row pointer as the values present in the clustered index. In other words, the table is stored in the order of the clustered index. The non-clustered indices point to entries in table using the values of the clustered index columns.

    When you query for values in the table, SQL Server will determine the best index to use. If a non-clustered index is used, and you retrieve column values from the table that are not in the non-clustered index, then SQL Server must then perform what is called a "bookmark lookup" to goto the record in the table to get the extra column values. You can avoid this by adding the likely extra columns to the non-clustered index. Bookmark lookups can be quite expensive in terms of disk activity, etc.

    In SQL 2000 you need to add the covering columns to the index - but an index can only be 900 bytes wide.

    In SQL 2005 you can specify the index columns and separately add columns to avoid bookmark lookups without using the columns for index ordering.

    There are several articles on this website about the different types of indices and how they interact. If you like I could look some up and recommend them. The key thing to keep in mind is that you should create your clustered index to reflect the order in which you'll typically want to see your results as the clustered index is what determines the "physical" ordering of rows in the table.

  • That was a great answer and I want to add to it that the clustered index also has a big impact on inserts and updates.  Remember that if you update a field in your clustered index, SQL is going to move your data row.  In addition to that, when you add a new row to your table, it is inserted into your clustered index in the appropriate place. 

    So, using fields that do not get updated frequently will help your update performance.  Also remember that if you frequently add rows that fall in the middle of your clustered index, you will want to adjust your fill factor to reduce page splitting and creating a fragmented index.

  • Great Topic -Answer of course is "It Depends".

    But, for general best practices, check out http://www.itsshowtime.com for a webcast fro Kimberly Tripp on Best Practices for Indexes.

    There are so many factors that go into making a "right" decision.  Kimberly does a great job of doing that here.  Plus, I could not do it justice in such a short post.

    Enjoy!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply