Indexing in SQL Server 2005

  • Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/aingold/2770.asp

  • One book to read is Relational Database Index Design and the Optimizers by Tapio Lahdenmaki and Mike Leach.

    This Book contains basics and a "little" further.

    Regards

    JT

    a DBA

  • The only comment I'd make is that the order of columns in a covered index "just depends" , generally the most restrictive ( selective) is best placed first - but not always - so it's worth making a few tests including putting the columns in reverse order, I tend to create a number of variations for a particular query and see which the optimiser finds best ( not for simple queries or all queries obviously ) Also worth noting that changing data volumes can change the index selection process.

    Nice article.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Colin, those are both good points that deserve mentioning! Thank you for pointing them out. It is definitely important to try changing up your high-cost queries to see how the query processor will handle them as part of the baseline/test process. Also, re-evaluating efficiency at a later date when data has changed is a good part of the full database lifecycle.

  • Great Article!  You know what would make your query even more valuable (to me, at least) is a rowcount for the table in the resultset.  The indexes I would want to pay the most attention to for potentially dropping are the indexes on the largest tables.  I would add that to the order by.  Anyone know how to get the rowcount in 2005?  The rowcnt field that was available on sysindexes doesn't appear to have made the transition from 2000 to 2005.

    Luke

  • Luke-

    Try joining to sys.dm_db_index_physical_stats to determine rowcount... that should work.

  • (post-coffee)

    Wait, that's a function...  try APPLY.

  • (mid-coffee)

    a join seemed to worked, I think...  would it have been better to use APPLY? TIA, I've never tried it.  And thank you for the article!

    >L<

    SELECT

    o.name AS object_name, i.name AS index_name,

    i

    .type_desc, u.user_seeks, u.user_scans, u.user_lookups, u.user_updates,

    p

    .record_count

    FROM

    sys.indexes i

    JOIN

    sys.objects o

    ON

    i.object_id = o.object_id

    LEFT

    JOIN sys.dm_db_index_usage_stats u

    ON

    i.object_id = u.object_id

    AND

    i.index_id = u.index_id

    AND

    u.database_id = DB_ID()

    JOIN

    (SELECT object_id,record_count

    FROM

    master.sys.dm_db_index_physical_stats(DB_ID(), NULL,NULL,NULL,'DETAILED') ) p

    ON

    i.object_id = p.object_id

    WHERE

    o.type <> 'S' -- No system tables!

    ORDER

    BY (ISNULL(p.record_count,0) +ISNULL(u.user_seeks, 0) + ISNULL(u.user_scans, 0) + ISNULL(u.user_lookups, 0) + ISNULL(u.user_updates, 0)), o.name, i.name
  • Great article with many good tips.

    You said, "I once had a developer look me straight in the eye and say that there would never be a problem with adding as many indexes as possible to a table; the query processor would simply select the ones it needed and ignore the rest."  That might be OK if this is a pre-built, static, never changing database.  Too many indexes are going to cause a bad impact on INSERT, UPDATE, and DELETE.  Remember all those indexes have to be updated when the data changes.

    ATBCharles Kincaid

  • Yup... see my comment two lines down:

    Unnecessary indexes can force the query processor to evaluate more before deciding on an optimum execution plan, can slow down transactional processing and can take up a great deal of space on disk.

  • I had planned on writing an article on this subject and never got around to it. You beat to it and covered some stuff I hadn't planned on covering (sys.dm_db_index_usage_stats).

    Good article.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • I used DMV sys.dm_db_missing_index_details and sys.dm_db_missing_index_group_stats to create some indexes in my client's database. He reported me that these indexes caused serious performance degradation in queries and had to drop them. He talked about 3 seconds vs. 3 minutes in performance degradation. I know that extra indexes can cause overhead in OLTP and operations that changes data but I can't figure out how an index that helps some queries can slow down dramatically others. I haven't reviewed this issue yet, but I would like to check some information before. Do you have some useful information about this?

  • There's lots of ways.

    1. Parameter sniffing: SQL Server generates a plan based on parameters passed to a procedure. That plan may work great most of the time but for certain other parameters, it performs poorly. The parameter it used for the plan generation runs quickly for some values and performs way too much work for others.

    For example, let's say I have a database with customer names in it. I have a nonclustered index on LastName. I perform a search for customers with the last name Williams and I return other columns with it. In order to return the data, SQL basically has to choose between performing a clustered index scan or using the index and then performing key lookups to find the other values. So SQL has to try to determine which method will be faster.

    So let's say that there are 100,000 rows in the table with an average of 100 rows per leaf page (about 1000 leaf pages). If the statistics indicate that there are approximately 100 rows with the LastName of Williams, then using the nonclustered index, it will have to read in 1 or 2 leaf pages of the nonclustered index and 2 non-leaf pages and then about 100 key lookups resulting in about 100 leaf pages read in to get the other pages. And let's say that on average, it has to scan 2 non-leaf pages to get to the leaf pages. It estimates that it will need to read in about 3/4 of the table to complete the clustered index scan. So using the nonclustered index, it will have to read about 303 or 304 leaf pages vs. possibly scanning about 750 leaf pages.

    That's a no brainer. Reading 304 pages will be much faster than 750 pages.

    Now let's say that we run the query again searching for the last name Daniels. SQL sees that there is a good plan already in cache so it uses it. Only this time, there are actually 400 rows returned. So reads 3 or 4 leaf pages of the nonclustered index and 2 non-leaf pages. Then it has to perform 400 key lookups to get the other columns requiring on average 2 non-leaf page reads for each. So it ends up actually performing 1205 or 1206 reads. If it had performed the table scan, it might have only had to read in 750 pages.

    One way that this issue could have been prevented is by creating a covereing index instead. This would have completely avoided the key lookups.

    2. Bad statistics: Using the examples from above, if I queried for the name Daniels and the statistics were inaccurate, SQL Server might think that there were only 100 people with the name Daniels causing it to choose the nonclustered index seek + key lookups instead of the table scan. If in fact there were 500 people named Daniels, the query is going to take a long time to run.

    3. Poor cluster key choice: I see this a lot. Using large values or non-sequential values for cluster keys can result in lots of performance problems. The bigger your cluster key, the bigger every index is. The cluster key is included in every row of a nonclustered index. The bigger the row of data in the index, the less number of rows that fit on a page the more pages the index has. Having a small cluster key vs. a large cluster key can result in a size difference of many GB's. I've done demos for people demonstrating how choosing, for example, a uniqueidentifier instead of an int can result in a size difference of more than 50 GB for a single large table very quickly.

    Another poor cluster key choice is when the data is not sequential and new data coming in may end up on any page in the index. This results in a lot of page splits requiring a lot of additional work and increasing fragmentation.

    4. Low usability/high maintenance indexes: Let's say that I have a set of queries that I run once a month. To speed up the queries, I create indexes on several tables. If I have a transaction tracking table has 50,000 inserts, updates, and deletes per month and I only use the index in my query once per month, then the cost to maintain the index is much greater than the usability of the index.

    I have seen indexes on tables with 50,000,000+ updates due to inserts, updates, and deletes that have never been used by a query.

    These are just a few examples. I could go on and on talking about other cases such as indexes with included LOB columns, duplicated indexes, etc.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • This was removed by the editor as SPAM

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply