Index Creation Guidelines

  • Leo Peysakhovich

    Hall of Fame

    Points: 3880

    Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/lP

  • Mark_Holst

    SSC-Addicted

    Points: 446

    A good article.  I think one small oversight is the use of "Covering Indexes" - if I am using the correct term.

    The concept is using a composite index to return columns directly [from the index] rather than requiring a separate operation to perform a "bookmark lookup".  A bookmark lookup can be rather costly in certain scenarios.

    On one occasion I have used this technique to avoid disc thrashing on a certain large table.

     

     

  • BuckWoody

    Ten Centuries

    Points: 1077

    Great article - but I do take issue with the statement that "every table needs a clustered index". In large, non-contiguous tables such as import tables for an ODS or Data Warehouse, this can dramatically slow down inserts. 

     

    Buck


    Buck Woody
    MCDBA, MCSE, Novell and Sun Certified

  • avi_a

    Mr or Mrs. 500

    Points: 527

    On the same note:

    I was once told that creating a forein-key constraint upon two tables causes an implicit creation of some indexes that can speed up joins on the FK relatoinship.

    Does anyone know of this kind of index (it should be some kind of "cross-table"  or "relational" index)?

     If creating a FK does in fact cause creation of some kind of cross-table index, can the same kind of index be created without the FK constraints?

     

    Many thanks,

     - Avi

     

     

     

  • Tatsu

    SSCertifiable

    Points: 7824

    Avi,

    Some Entity-Relationship modeling tools create indexes on FK relationships for you behind the scenes but SQL Server does not do this on its own. Also, an index can only apply to a single table and is completely separate from the constraints on a table (Including FK's) so your last question really does not apply.

    Example:

    If you have an OrderItem table with a ProductID column that references a Product table via foreign key, the ProductID column on the OrderItem table would not automatically have an index on it but you would probably want to create one since it will likely be used frequently in join operations.

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • avi_a

    Mr or Mrs. 500

    Points: 527

    Thanks Bryant.

    Do you know what kind of index is created automatically when creating a FK? is this some sort of an implicit index or can it be scripted from EM like any other index?

    The reason I'm so interested in this issue is this:

    I have a data set with FKs which is replicated into a subscription without the FK constraints. (this is by design)

    However, when testing my queries on the original publication vs the subscription DB, I noticed the execution plans on the publisher used many indexes that wern't used in the subscription database, thus producing degraded performance on the subscription DB.

    So what I really want to do is create in the subscription DB indexes that will be equivilent to the automatic indexes created in the publisher via thit's FKs.

    Any thoughts on this would be appreciated,

     - Avi

     

     

  • Tatsu

    SSCertifiable

    Points: 7824

    The indexes should be like any other index but some of those modelling tools use some strange naming conventions. You should be able to script these out like any other index regardless of the name though.

    There is one thing to watch out for if the database was originally designed using a modeling tool that created the indexes for you. If you do not make all schema changes in the modeling tool you could end up with inconsistencies in how these indexes are built on new foreign keys. If the database was originally built with one of these tools but the model is no longer used, make sure you manually create the indexes on the FK's where needed.

    I only have experience with ERWin and Visio. ERWin used to create the indexes for you (I think there was an option to turn it off) but it has been several years since I used it. Visio does not create the indexes for you automatically but its diagramming scheme shows you which columns have FK's and indexes so it is easy to see where you might have missed an important index on a FK column.

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • Antares686

    SSC Guru

    Points: 125444

    I agree, especially in the way of datamarts as you don't want to have to wait for shifting of data to complete. Clustered indexes have a great impact but does not always lend itself to some systems.

  • David.Poole

    SSC Guru

    Points: 75031

    Good article!

    One thing I'm not sure of is where you would use CREATE STATISTICS.

    I know indices maintain their own statistics but if you do a query using an ORDER BY, WHERE, GROUP BY, HAVING etc on a non-indexed column SQL Server creates a _wa_sys statistic.

    Is there a performance overhead in maintaining these statistics as there is with indices?

  • Mark_Holst

    SSC-Addicted

    Points: 446

    I think rather than everyone chipping in there, somebody should make that the next article topic!

    I remember in the MCP study guide for SQL database & design, there was a whole section on statistics that was both fascinating and helpful, and makes you appreciate how it's mostly all automated.  One thing that you did have to come out remembering was the creation of automated statistics (like the _wa_ ones you mentioned), and how those columns could indeed make themselves index candidates.

  • DCPeterson

    SSCoach

    Points: 15145

    Generally a good article, but the author fell prey to one of the most common errors about clustered indexes.  He states:

    "Do not try to place a clustered index on a column that can be updated or in the table were the INSERT is a frequently issued command and clustered index column is randomly defined.  It will force the engine to physically rearrange pages allocated for the table, which will slow the system down."

    Not quite true...  Yes, inserts in in that situation can cause page splits, but this is not the same as "physically rearranging pages".  The data pages do not need to be physically arranged (on disk) in any particular order, and in fact they are not.  The various pages are linked together via a "double linked list" mechanisim where each page contains a pointer to the next page (logically NOT physically) and to the previous page.  It is this linkage that must be updated and maintained when pages are split.  Maintaining the physical order of pages would indeed be prohibitively expensive and it is not done for that reason.

     

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • David Branscome

    Old Hand

    Points: 356

    Excellent article.

    One of the few I have seen which mentioned the selectivity guidelines for non clustered indexes.

    Yes it is a good reason to drop an index if a single value returns more than 10-15% of the table.

    David Branscome

  • Sergiy

    SSC Guru

    Points: 109666

    What's wrong with Datetime fields? Why indexes on those fields are not recommended?

    I use to work with big tables containing records about some events. And main part of information is Start Date, End Date, Date Scheduled, etc.

    Typical query: select Cols from TABLE where Date between @StartDate and @EndDate.

    It takes ages to run this query unless you create clustered index on Datetime column. Non clustered index helps but not really much. But clustered index dramatically improves performance, especially if you are interested in short period between @StartDate and @EndDate - e.g. one month from 30 years log.

    That’s why I really surprised by suggestion not to use indexes on Datetime columns. If there is any reasonable explanation to this?

     

    And another notice for INSERT, DELETE and UPDATE. If you choose right indexes policy this operations may not cause significant delays because of indexes. For example, if you delete solid range of rows arranged by clustered index (delete from TABLE where INDEXEDCOL between A and B) it will take the same time as delete up to 10 rows depending on fragmentation level of your table. Because it will delete rows only from 2 pages (where A and B values are stored) and then delete pointers to PAGES.

    But if you have 10 other indexes on the same table it won’t really help. Those indexes will eat all your time.

  • Sean Fackrell

    SSC-Addicted

    Points: 408

    I agree with Sergiy on the indexes on datetime data types.  These are just represented by integers anyway, aren't they?  Is there a large additional overhead because of using datetime over, say, integer?

    I have a table that logs requests from users and will be frequently queried on the request date (either one day or a date range).  I have used a non-clustered index on an IDENTITY PK and a clustered index on the request date (smalldatetime).  The likelihood of two requests being made during the same minute is quite small so the request date column is almost unique.  Also the request date inserted will always be greater than or equal to the greatest request date.

    The table will start at 0 and will only increase by about 2-3000 rows each year so I doubt the index options would make much difference. 

    Does anyone have some more info on the use of datetime/smalldatetime as indexes, clustered or non-clustered?

     

  • Leo Peysakhovich

    Hall of Fame

    Points: 3880

    Based on my experience I do believe that clustered index should be for every table. If table don't have a good candidate for the clustered index for whatever reason then it may be a case to add fake column as identity (Identity column is always good candidate for clustered index) It will not slow performance because record always will be stored at the end but will give great benefit for the table maintainace and database shrinking if database has often data loads and/or delete rows.

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply