Clustered Index Strategies

  • I am working on our database coding standards and am struggling on best practices for determining the clustered index.  I am finding conflicting information when determining our companie's indexing strategy:

    Argument 1 (source: Kimberly Tripp):  In many cases, the clustered index being put on an identity column can be a really good choice for performance because it minimizes fragmentation and all of the secondary uses of nonclustered indexes can be a lot more efficient when the clustered index is static, narrow, and unique. If I just say start with the base structure that’s you’re clustered index, probably choose a clustered index on an identity column and then help your performance through secondary indexes, which is nonclustered, kind of my best practices.

    Argument 2 (source: http://www.sql-server-performance.com_:  Avoid putting a clustered index on clumns that increment, such as an identity, date or similarly incrementing columns, if your table is subject to a high level of INSERTS.  Since clustered indexes force the data to be physically ordered, a clustered index on an incrementing column forces new data to be inserted at the same page in the table, creating a table hot spot, which can ceate disk I/O bottlenecks.

    Those arguments above are completely conflicting.  We are leaning towards Argument 1 (Kimberly Tripp's stragety) for OLTP databases and then for OLAP Databases, we'd determine the clustering index on those that are searched on ranges of information, etc..

    Can someone give me some input on what their indexing strategy is?

     

  • "Can someone give me some input on what their indexing strategy is?"

    I copy the database to a (nearly) identical test server but without any indexes, clustered or nonclustered.  Then I run the performance advisor with the queries (update, insert, select, whatever) most frequently used in production on each large table.  If the queries change then I re-run with the new ones and adjust indexes as needed.

    Sometimes it suggests clustered indexes and sometimes it says a few non-clustered are all that's needed.

    Anyway, this method removes the stress of figuring it out manually and even if it isn't perfect it works reasonably well.  Put your brain power into things the automatic advisor has no idea about like disk array configuration.

    PS - 1 and 2 in your post aren't conflicting because #2 is based on an important IF clause that the first one neglects to mention.

  • Man I don't get it. I posted a long reply and it never took and now it is gone. Happens every time actually. Here is the short version. Both are true. It depends on how you organize your data and how you plan on getting to it. I actually combine both and replace 5 columns with a long to keep non clustered indexes small and I derive the long dynamically from additional lookups. So I have my range and the values are very selective. Not normalized tables of course but takes advantage of both ideas on a limited set of tables. Options 1 makes little sense if youare querying on order date range for example. Use a non clustered index for a valid big range query where the identity is the cluster and the non cluster is on order date and the DB server will earn it's living the hard way. If there is lots of data plan on lots of ram to as you caould casue page faults. Small installations may not be a big deal actaully. But, if you plan on picking individual rows then it may makes sense since non clustered indexes are built on clustered indexes. The more selective the indexes the better they are is the deal. Of course options are theroetical.

    Plan on an index strategy based on your worst usage case scenario. It just depends on your usage. That said I have a real real hard time figuring out how option 1 can actually be used out side of theroetical benifit. Why not just put the cluster on the column if you plan on hitting it anyhow in a range. If you do not plan on using a cluster at all, then yes, index fragmentation can be reduced, but ther are other ways of dealiong with it besides sacrificaing a cluster altogether. In gereral you should always have a clustered index. I do use option 1, kind of as I stated, but I have a plan on how to still use it as a clustered index. another thing, a Cluster has the actual data at the leaf level where the non cluster has a pointer to the heap so an additional look up is needed. A cluster is always faster period. A non cluster has the cluster colmns as part of it. In a cluster when you resolved it you are at the data. In a njon cluster you still have to do another jump to the data. This gets into another whole subject matter.... Bottom line, find a way to use it for the most frequent queries and manage the secondary indexes. 2 sperate issues. There is no silver bullet I am affraid.

    If in doubt use Option 2 is will server you better and covers a far broader set of usage scenarios. No question about that. Good luck!

    -Eric

  • I recomend clusters on unchanging, increasing, narrow, unique columns (priority in that order), especially if you have a lot of queries that sort by that field, do ranged searches or do group bys on it.

    I saw a study a while back (can't remember where) that showed that you need thousands or inserts/sec to cause an insert hotsopt, especially with the fast disks these days. At that level of inserts you've got a bigger problem - write contention and hotspots on the transaction log

    My suggestion would be to go with Kimberly's advice. I'm moving all my clusters from GUIDS onto better fields (see criteria above) and am seeing much improvement. The tables that still have guid clusters get fragmented extremely fast and querys run slower and slower on them.

    The main piece of advice that I always give for indexes - try and test. If the index change doesn't improve performance, go back and try something else.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Re Eric's problem about losing long posts...  I have had this problem a few times.  There is a timeout limit on creating a new post, and if you exceed this your post gets lost.

    I now tend to do a ctrl-a and ctrl-c just before submitting my post, so that if it gets lost I have a copy in the clipboard that I can quickly paste into a replacement post.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • The real answer is: 'it depends'

    Always remember that a clustered index gives you the opportunity to give best possible performance to a range query that uses the index columns.  If you have a business-critical range query then this is a major candidate for your clustered index.  However, if using the columns in this critical query would badly affect other work, then you may need to cluster on something else.

    GilaMonster's recommendation of prefering "unchanging, increasing, narrow, unique columns (priority in that order)" is sound, but should not be cast in stone.  It is best that you give a recommendation on how to choose the clustered index keys, but allow the project DBA to justify whatever is best for the application.

    Likewise, you should recommend that all tables have a clustered index, but allow justified exceptions.  Without a CI, the extents for your table will tend to get spread all over your data files and can affect performance.  With a CI a index rebuild will consolidate your extents and improve performance.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Yup - it just depends - just attended one of Kimberley's indexing events - I don't agree necesscarily with her choices - apps are rarely ideal and often you have to make compromises. I tested insert hotspots and was unable to produce a problem.

    Sometimes your clustered index is actually better non unique and wide(ish). ( for performance reasons! )

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • I'm totally with those people who gave replies like "it depends", and, more importantly, "try and test".

    You can't give a hard and fast recommendation for a standard indexing strategy, since there's no such thing.

    A couple of points about clustered indexes, though, that I often agonise over:

    1) Don't feel that you have to implement a PK as a clustered index. PKs return single rows (by definition), and thus you're wasting your clustered index which is far better at returning multiple rows. A NON-clustered PK can be just as effective.

    2) If you want to use natural keys, try not to make them too wide in your clustered index. Don't forget that the clustered index columns are also included automatically in each NC index, so you'll increase the width of all your indexes if you make your clustered index too wide.

    3) If you put a clustered index on a natural key, but insert data into the table in a non-clustered order, you will probably get frequent page splits.

  • The most definitive answer for this is easy.

    1. Create your clustered index on your best guess column, using common sense and recommendations. And other indexes on whatever other columns you believe to be needed.

    2. Run a profile on the db, looking at object scans, performance info, page splits, etc, and including the index id during real world usage of table.

    3. Based on the index usage, performance info, and paging info evidence you recieve from the profile, move your clustered index to the best usage index.

    4. Repeat process till happy with results.

    Real world information, execution plans, and profiler takes all guess work out of the equation. It takes a good bit of work, but makes all the difference. Quite frankly, you will find that no recommendation fits all situations, while definitive fact finding will lead you to the correct answer for your situation.

     

  • My perspective is more of on index maintenance and performance.  If needed on an auto-increment/date field I tend to rebuild the indexes more often with 0 free space in the index.  The fragmentation will only happen at the tail end of the file and the reads (most queries) will be fast because of the fewer physical pages on disk.  That being said, I generally will rebuild those indexes nightly to keep perfomance up.  I've never had problems with hot-spotting on disk but index fragmentation at end of file is kicker with those types of clustered indexes.  I will also index using covering indexes for non clustered indexes when needed, to avoid the bookmark lookup.

    Just my 2 cents.

    Tom

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply