Blog Post

Tables without Clustered Indexes?

Today, we experienced performance issues with some of the SSRS reports that were deployed as part of the latest application/database release. While investigating this performance problem, I realized that the underlying tables for these report queries do not have clustered index. I was even more surprised when I realized that some of these tables are huge and have few million rows in them. A good practice is to have a clustered index on all tables in SQL Server, as it helps to improve query performance. This is due to the fact that clustered indexes affect a table’s physical sort order, and a table that doesn’t have a clustered index is stored in a set of data pages called a heap where:

  • Data is stored in the order in which it is entered.
  • Rows are not stored in any particular order.
  • Pages aren’t sequenced in any particular order.
  • There is not a linked list linking the data pages.

When a table has clustered index, SQL Server physically sorts table rows in clustered index order based on clustered index key column values. In short, leaf node of clustered index contains data pages, and scanning them will return the actual data rows. Therefore, table can have only one clustered index.

When to have a clustered index on table?

Although it is not mandatory to have a clustered index per table, but, according to the MSDN article (Clustered Index Design Guidelines), with few exceptions, every table should have a clustered index defined on the column or columns that used as follows:

  • The table is large and does not have nonclustered index. Having clustered index improves performance, because without it, all rows of the table should be read to find any row.
  • Column or columns are frequently queried and data is returned in sorted ordered. Having clustered index on the sorting column or columns prevents sorting operation and returns the data in sorted order.
  • Column or columns are frequently queried and data is grouped together. As data must be sorted before it is grouped, having clustered index on the sorting column or columns prevents sorting operation.
  • Column or columns data that are frequently used in queries to search data ranges from the table. Having clustered indexes on the range column will avoid sorting entire table data.

So in order to resolve these performance issues, I re-wrote these queries and created clustered indexes on tables where appropriate. Moreover, I analyse further, and used the following two queries, to find out which tables in other databases do not have a clustered index defined.

The first query return names of all tables with row count greater than specified threshold, and do not have a clustered index defined. This query inner joins sys.tables system catalog to sys.dm_db_partition_stats dynamic management view to obtain this information (See below):

DECLARE @MinTableRowsThreshold [int];
SET @MinTableRowsThreshold = 5000;
;WITH    [TablesWithoutClusteredIndexes] ( [db_name], [table_name], [table_schema], [row_count] )
          AS ( SELECT   DB_NAME() ,
                        t.[name] ,
                        SCHEMA_NAME(t.[schema_id]) ,
                        SUM(ps.[row_count])
               FROM     [sys].[tables] t
                        INNER JOIN [sys].[dm_db_partition_stats] ps
ON ps.[object_id] = t.[object_id]
               WHERE    OBJECTPROPERTY(t.[object_id], N'TableHasClustIndex') = 0
                        AND ps.[index_id] < 2
               GROUP BY t.[name] ,
                        t.[schema_id] )
    SELECT  *
    FROM    [TablesWithoutClusteredIndexes]
    WHERE   [row_count] > @MinTableRowsThreshold;

The second query is slightly a modified version of first query and returns the names of actively queried tables with row count greater than specified threshold, and do not have a clustered index defined. This query inner joins sys.dm_db_index_usage_stats to the first query to identify actively queried tables (See below):

DECLARE @MinTableRowsThreshold [int];
SET @MinTableRowsThreshold = 5000;
;WITH    [TablesWithoutClusteredIndexes] ( [db_name], [table_name], [table_schema], [row_count] )
          AS ( SELECT   DB_NAME() ,
                        t.[name] ,
                        SCHEMA_NAME(t.[schema_id]) ,
                        SUM(ps.[row_count])
               FROM     [sys].[tables] t
                        INNER JOIN [sys].[dm_db_partition_stats] ps
ON ps.[object_id] = t.[object_id]
            INNER JOIN [sys].[dm_db_index_usage_stats] us
ON ps.[object_id] = us.[object_id]
               WHERE    OBJECTPROPERTY(t.[object_id], N'TableHasClustIndex') = 0
                        AND ps.[index_id] < 2
AND COALESCE(us.[user_seeks] ,
         us.[user_scans] ,
         us.[user_lookups] ,
         us.[user_updates]) IS NOT NULL
               GROUP BY t.[name] ,
                        t.[schema_id] )
    SELECT  *
    FROM    [TablesWithoutClusteredIndexes]
    WHERE   [row_count] > @MinTableRowsThreshold;

I hope you will find this post informative. For further information about clustered index design guideline, see MSDN resource here.

Further Reading:

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating