• Jeff Moden (1/6/2014)


    ScottPletcher (1/6/2014)


    Actually, it is a terrible idea to add an identity to this table.

    Depending on your specific data, future inserts could be sequential anyway. Even if not, rows are inserted once and typically read 10s, or even 100s, 1000s or more times, particularly for lookup tables, which this certainly seems to be. Besides, tables can of course be rebuilt when truly needed, and rebuilt online for this table (assuming Enterprise or other qualifying edition).

    Overall, the biggest performance gain you'll get is to stop using dopey identity clusters by default and instead consider carefully how the table is accessed. I've removed over 6000 indexes and improved SELECT performance dramatically, particularly for joins, the majority of it from just replacing identity clusters with properly-keyed clustered indexes.

    First of all, stop using words like "dopey". There are some very good uses of IDENTITY columns that I use and your comments infer that I'm dopey. Unless you really like pork chops, please keep such comments to yourself. They don't become you as a professional, they add nothing to the conversation, and they can easily be taken as attempts to belittle or as ad hominem attacks.

    It certainly is possible that "future inserts could be sequential". If that's true, then that fits one of the commonly accepted best practices of making a Clustered Index "ever increasing" and you would, indeed, not need an identity column for this table.

    If, however, they are not entered sequentially, then it's not so much the performance of SELECTs that I'd be concerned with. Rather it would be the performance of the INSERTS. If there are a lot of INSERTs to this table by a lot of concurrent users, there will be massive page splits that can and will cause timeouts at the GUI.

    If the table is mostly static and is of manageable size, then I agree, it almost doesn't matter if an occasional page split occurs. That brings up another point. You have to also plan on maintenance requirements and space. If this is a large table, then massive page splits can easily cause the table to double in size due to the space wasted by the page splits and, contrary to popular belief, disk space isn't cheap. Yeah... you could REORGANIZE the Clustered Index on a regular basis, but even though you can do that in an online fashion, it's expensive CPU and IO wise. The other thing is that REORGANIZE is FULLY LOGGED no matter what recovery mode you’re in and REORGANIZE does NOT rebuild the B-TREE.

    If you decide to REBUILD a Clustered Index, you might be able to get away with the WITH DROP EXISTING optimization to keep your MDF file from growing. If you can't, then remember that any index over 128 extents (just 8MB), will have its page usage preserved until the new copy of the index has been created. If that's the Clustered Index of a 1TB table, then you better have a free TB+ hanging around to REBUILD it.

    The only good part about REBUILDing indexes is that they will be minimally logged in the BULK-LOGGED and SIMPLE recovery modes. Just keep in mind that if a minimally logged operation occurs while in the BULK-LOGGED mode, any log backups made during that timeframe cannot be partially used for true Point-in-Time recovery. You either have to use the whole log backup or you have to stop at a point before it.

    You ALSO have to consider the point (prior to 2014) that if a table had ANY blob columns in it, then you can NOT rebuild the Clustered index online. If the table has a blob in it, that automatically means that the Clustered Index cannot be rebuilt online.

    So, lots to consider. You can't just say, "Besides, tables can of course be rebuilt when truly needed, and rebuilt online for this table" because you might not actually be able to either because of blobs or because you don't actually have the disk space.

    The overall problem with this thread, so far, is that no one but the OP actually knows HOW the table will be used and he hasn't said yet.

    Even if some splits do occur, the table won't "easily double in size", since that would require every page to split.

    Automatically adding an identity as the clustering key by default simply is dopey. The clustered index is the most important index on any table, and so it should always be considered carefully.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.