Problem with enormous indexes

  • Hi guys,

    I inherited a database several years ago and I don't have a lot of experience with creating indexes. I'm trying to keep the database size under control and I found that I have indexes that are larger than their tables. I have a table called "menuitem_trans_detail" that is 90 GB. The index size is 103 GB. I have a script that reorganizes/rebuilds indexes regularly. I found that there are 4 non-unique, non-clustered indexes on the table, each containing a single, frequently accessed column (such as store_oid and batchID).

    Now, this table is basically transaction detail from over 1000 stores on a daily basis. Should I add a clustered index on this table? Will that greatly decrease the size of my other indexes? I guess batchID is probably the best candidate as it is the only field that is completely unique. Is that my best option, even though store_oid and StartBusinessDate are searched more frequently (but these aren't unique)?

    Thanks!

    Bill

  • Are you saying you only have 4 single column non_clustered indexes that add up to 103GB? How many columns in the table?

    A clustered index is always a good thing to have on a table. Bear in mind that a heap (a table with no clustered index) will have an 8 byte Row ID added to each non-clustered index, a table with a clustered index will have the value for the clustered index added to non-clustered indexes, so, if you choose a UNIQUE clustered index smaller than 8 bytes you have a space saving right there.

    If the clustered index is non-unique a 4 byte uniquifier is added as well.

    See this for info on choosing a candidate for a clustered index, it is often (but not necessarily) the primary key. If you don not have a suitable candidate consider adding an identity column to the table.

    Look for unused indexes you can get rid of with this query. Ensure they really redundant by testing of the full cycle of your queries (i.e. monthly, annual, reports) and do not remove indexes that enforce uniqueness.

    -- Possible Bad NC Indexes (writes > reads)

    SELECT OBJECT_NAME(s.[object_id]) AS [Table Name], i.name AS [Index Name], i.index_id,

    user_updates AS [Total Writes], user_seeks + user_scans + user_lookups AS [Total Reads],

    user_updates - (user_seeks + user_scans + user_lookups) AS [Difference]

    FROM sys.dm_db_index_usage_stats AS s WITH (NOLOCK)

    INNER JOIN sys.indexes AS i WITH (NOLOCK)

    ON s.[object_id] = i.[object_id]

    AND i.index_id = s.index_id

    WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1

    AND s.database_id = DB_ID()

    AND user_updates > (user_seeks + user_scans + user_lookups)

    AND i.index_id > 1

    ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC;

    -- Look for indexes with high numbers of writes and zero or very low numbers of reads

    ---------------------------------------------------------------------

  • Also, adding a clustered index will allow you to defragment the leaf level of the clustered index, i.e. the data pages themselves. That could save a shed load of space.

    ---------------------------------------------------------------------

  • Just remember that a clustered index is the table, so it may look very large, but that's because it's storing all the data.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the replies. George, that appears to be the case. The index size for the table when I run sp_spaceused is 103 GB, so I assume that is the total of all 4 indexes. There are 22 columns on the table. I ran the query that you provided for writes > reads and here are the results for this particular table:

    ps_micros_e7_menu_item_detail_totalsidx23830123708301237

    ps_micros_e7_menu_item_detail_totalsidx34830123710608300177

    ps_micros_e7_menu_item_detail_totalsidx128301237160468285191

    ps_micros_e7_menu_item_detail_totalsidx_e7_menu_item_detail_totals_startbusdate1983012372038158097422

    What does this mean exactly? Also, were there more queries after this? I'm unable to scroll down to see anything else in that box.

    I think the only realistic candidate for a clustered index would be batchID. It is the only column that is completely unique. I have others that are queried more, such as store_oid and StartBusinessDate, but they repeat. Knowing this, do you recommend adding an identity column instead? Here is an example of a batchId (they are all approximately this length): AA-CA-157-1_031914.zip

    Thanks for your help!

    *EDIT* I just discovered that this table has no primary key. Perhaps adding an identity column and using it as the primary key and creating the clustered index on it is the way to go?

  • In general, I suggest putting the clustered index on the most commonly used path to access the data. This is because the data is stored on the clustered index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • bwood-930464 (3/28/2014)


    ps_micros_e7_menu_item_detail_totalsidx23830123708301237

    ps_micros_e7_menu_item_detail_totalsidx34830123710608300177

    ps_micros_e7_menu_item_detail_totalsidx128301237160468285191

    ps_micros_e7_menu_item_detail_totalsidx_e7_menu_item_detail_totals_startbusdate1983012372038158097422

    What does this mean exactly? Also, were there more queries after this? I'm unable to scroll down to see anything else in that box.

    There were no more queries. Those results mean that the top index was written to 8301237 times but was never used to satisfy a query (0 in the last but one column). This suggests the index is redundant and could be dropped. However note these values are only since the last SQL restart and see the caveats in my previous post. Test it in non-prod first as well!

    If you only have 4 single column indexes out of a 22 column table there is no way the total index size could be greater than the data size, so you may have misinterpreted the values. Run DBCC updateusage for the table (at a quite time), run sp_spaceused again and post the results formatted with headers if you have any doubts.

    I think the only realistic candidate for a clustered index would be batchID. It is the only column that is completely unique. I have others that are queried more, such as store_oid and StartBusinessDate, but they repeat. Knowing this, do you recommend adding an identity column instead? Here is an example of a batchId (they are all approximately this length): AA-CA-157-1_031914.zip

    *EDIT* I just discovered that this table has no primary key. Perhaps adding an identity column and using it as the primary key and creating the clustered index on it is the way to go?

    If AA-CA-157-1_031914.zip is a value for batchID it looks rather wide and random. That would fail two criteria for a good clustered index (narrow and increasing). Unless BatchIDs are ever increasing as new values get added and the BatchID is used in range searches, I would be tempted to add an identity column and make that the clustered index (NOT primary key). This would not be to aid queries but purely to keep the data pages compacted.

    ---------------------------------------------------------------------

  • A clustered index on StartBusinessDate + an identity column would probably be a better clustered index on the table.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply