Stairway to Exploring Database Metadata

Exploring SQL Server Indexes: Stairway to Exploring Database Metadata Level 3

,

I’ve intended this stairway to appeal to a range of expertise and knowledge levels about SQL Server. Some of what I describe will seem obvious to you but I aim to add information that will be new to even experienced DBAs or database developers: After all, some of the information surprised me. I will, however, be assuming a reasonable working knowledge of database objects and other types of metadata; in other words, what a view, table, function, procedure, schema, constraint and so on actually is.

In the first level, I explained what information is provided by SQL Server to access the metadata for your database, why it was there, and how you’d use it. I ended up by showing you how to find out the names of all the various database objects in the database. In the second part, I chose the topic of triggers because it is a database object that provides a good example of the sort of questions that come up, and the way that you can explore the metadata to answer those questions.

We will now tackle indexes, not just because they are important, but because they are a good example of a type of metadata, like a column or distribution statistic, that isn’t treated as an object within the metadata.

Indexes are absolutely essential to any relational database table. However, like butter on toast, it is possible to overdo them and thereby create a source of problems in databases. Sometimes you can over-index or under-index a table, or build duplicate indexes. Sometimes it is a matter of choosing a bad fillfactor, setting the ignore_dup_key option wrongly, creating an index that never gets used (but has to be maintained), missing indexes on foreign keys, or having a GUID as part of a primary key. In short, indexes in any busy database systems need regular maintenance and verification, and the catalog views are one of the most obvious way of doing these chores.

What indexes have you got?

So let’s start with a simple report on the indexes in your database, as in the following query.

SELECT  convert(CHAR(50),object_schema_name(t.object_ID)+'.'
    +object_name(t.object_ID)) AS 'The Table', i.name AS index_name
FROM sys.indexes AS i
  INNER JOIN sys.tables t
    ON t.object_id=i.object_id
  WHERE is_hypothetical = 0 AND i.index_id <> 0; 

Here are my results. Yours, of course, will vary.

Why do we bother to join to sys.tables? This is because it is a handy way of ensuring that you get only user tables. We choose index_id values greater than zero, because if you don’t create a clustered index for a table there is still an entry here in sys.indexes but it refers to the heap and does not represent an index. Every table will have one row in sys.indexes with an index_id value of either 0 or 1. If the table has a clustered index, there will be a row with and index_id value of 1; if the table is a heap (which is just another way of saying the table has no clustered index), there will be a row with index_id value of 0. In addition, whether the table has a clustered index or not, there will be a row for each nonclustered index with an index_id value greater than 1. We have filtered out hypothetical indexes, , which are created by the Database Engine Tuning Advisor (DTA) merely to test out whether a possible index would be effective. It is a good idea to remove them if they accumulate.

This query, as with many others in this stairway, are of more practical use if you filter on one or more specific tables. You could, for example append this…

  AND t.object_id = OBJECT_ID('Production.BillOfMaterials');

How many indexes does each table have, with a list of their names

The previous report isn’t entirely useful as you cannot see at a glance how many indexes each table has, and what they are. This will do the trick

SELECT  convert(CHAR(20),object_schema_name(t.object_ID)+'.'
    +object_name(t.object_ID)) AS 'The_Table',
sum(CASE WHEN i.object_ID IS NULL THEN 0 ELSE 1 END) AS The_Count,
coalesce(stuff(( 
     SELECT ', '+i2.name
       FROM sys.indexes i2
       WHERE t.object_ID = i2.object_ID
       ORDER BY i2.name
     FOR XML PATH(''), TYPE).value(N'(./text())[1]',N'varchar(8000)'),1,2,''),'') AS Index_List
  FROM sys.tables AS t
  LEFT OUTER JOIN sys.indexes i
    ON t.object_id=i.object_id
      AND is_hypothetical = 0 AND i.index_id > 0 
GROUP BY t.Object_ID;

I’ve tested it on an aging version of the old pubs sample database because it doesn’t have ridiculously long object names.

The_Table            The_Count   Index_List
-------------------- ----------- --------------------------------------------------
dbo.publishers       1           UPKCL_pubind
dbo.titles           2           titleind, UPKCL_titleidind
dbo.titleauthor      3           auidind, titleidind, UPKCL_taind
dbo.stores           1           UPK_storeid
dbo.sales            2           titleidind, UPKCL_sales
dbo.roysched         1           titleidind
dbo.discounts        0           
dbo.jobs             1           PK__jobs__6E32B6A51A14E395
dbo.pub_info         1           UPKCL_pubinfo
dbo.employee         2           employee_ind, PK_emp_id
dbo.authors          2           aunmind, UPKCL_auidind
(11 row(s) affected)

Finding tables without a clustered index

There is a lot of interesting things you can find out about indexes. Here, for example, is a quick way of finding tables without clustered indexes (heaps)

-- list all tables by name that have no clustered Index
SELECT  object_schema_name(sys.tables.object_id)+'.'
    +object_name(sys.tables.object_id) AS 'Heaps'
      FROM sys.indexes /* see whether the table is a heap */      INNER JOIN sys.tables ON sys.tables.object_ID=sys.indexes.object_ID
      WHERE sys.indexes.type = 0;

How many rows in each index?

By linking with the sys.partitions view, we can find out the approximate number of rows in the index. I’ve included some code at the end that joins with sys.extended_properties just to pull out any comments attached to these indexes.

--list the number of rows for each index/heap
SELECT 
  OBJECT_SCHEMA_NAME(t.object_id)+'.'+OBJECT_NAME(t.object_id) as 'Table',
  coalesce(i.NAME,'(IAM for heap)') as 'Index',
  Coalesce(
   (SELECT SUM(s.rows) FROM sys.partitions s WHERE s.object_id = i.object_id
AND s.index_id = i.index_ID
), 0) 'Rows',coalesce(ep.Value,'') as comments
 FROM sys.tables t
   INNER JOIN sys.indexes i ON i.object_id = t.object_id
   LEFT OUTER JOIN sys.Extended_Properties ep
   ON i.Object_Id = ep.Major_Id AND i.Index_Id = Minor_Id AND Class = 7;

And you can vary this to tell you simply how many rows each table has, based on the rows in the indexes for that table

SELECT
  OBJECT_SCHEMA_NAME(t.object_id)+'.'+OBJECT_NAME(t.object_id) AS 'Table',  
  sum(rows) AS row_count
FROM sys.partitions p INNER JOIN sys.tables t 
   ON p.object_ID=t.object_ID 
WHERE index_id < 2 GROUP BY t.object_ID,Index_ID; 

Are there a lot of indexes in any table?

If you are suspicious about some tables having a lot of indexes you can use this query that tells you the tables with more than four indexes and an index count greater than half the column count. It is an arbitrary way of selecting tables with a lot of indexes and there can be valid reasons for having them.

--tables with more than three indexes
--and an index count greater than half the column count
SELECT object_schema_name(TheIndexes.Object_ID) + '.'+ object_name(TheIndexes.Object_ID) AS TableName,
       Columns, Indexes 
 FROM 
   (SELECT count(*) AS indexes, t.object_ID
      FROM sys.indexes i
      INNER JOIN sys.tables t
        ON i.object_ID=t.object_ID 
    GROUP BY t.object_ID) TheIndexes
 INNER JOIN
  (SELECT count(*) AS columns, t.object_ID
     FROM sys.columns c
       INNER JOIN sys.tables t
     ON c.object_ID=t.object_ID 
   GROUP BY t.object_ID)TheColumns
 ON TheIndexes.object_ID=TheColumns.object_ID
 WHERE indexes>columns/2 AND indexes>4;

The final line ‘WHERE indexes>columns/2 AND indexes>4;’ needs to be changed as required to meet your criteria for ‘a lot of indexes’

Indexes updated but not read

It is always worth finding out which indexes haven’t been used since the server was last started, especially if the server has been doing a wide variety of work. There can be a good reason for an index not being used since a restart, as when it is useful only for end-of-month reporting.

--Indexes updated but not read.
SELECT
object_schema_name(i.Object_ID) + '.'+ object_name(i.Object_ID) as Thetable,
i.nameAS 'Index'
  FROM sys.indexes i
    left outer join sys.dm_db_index_usage_stats s 
      ON s.object_id = i.object_id
   AND s.index_id = i.index_id
   AND s.database_id = DB_ID()
  WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
    AND i.index_id > 0  --Exclude heaps.
    AND i.is_primary_key = 0 --and Exclude primary keys.
    AND i.is_unique = 0--and Exclude unique constraints.
    AND coalesce(s.user_lookups + s.user_scans + s.user_seeks,0) = 0 --No user reads.
    AND coalesce(s.user_updates,0) > 0; --Index is being updated.

(Note that we’ve used the dynamic management view sys.dm_db_index_usage_stats here which collects the usage information. We’ll discuss this object, and other related ones, in more detail later, but for the time being just admire it.)

How much space are these indexes taking?

I wonder how much space these indexes are taking up. It is worth checking a fat index where a lot of columns are used, or included, in an index as it is possible that the column values aren’t actually specified in any query.

SELECT 
  object_schema_name(i.Object_ID) + '.'+ object_name(i.Object_ID) AS Thetable,
  coalesce(i.name,'heap IAM')AS 'Index',
  convert(DECIMAL(9,2),(sum(a.total_pages) * 8.00) / 1024.00)AS 'Index_MB'
FROM sys.indexes i
INNER JOIN sys.partitions p 
  ON i.object_id = p.object_id
AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a 
  ON p.partition_id = a.container_id
  WHERE objectproperty(i.object_id, 'IsUserTable') = 1
GROUP BY i.object_id, i.index_id, i.name;

How do I calculate the total index space per table.

Let’s see what the total index space is for each table, along with the number of rows in the table.

SELECT 
  object_schema_name(i.Object_ID) + '.'+ object_name(i.Object_ID) AS Thetable,
  convert(DECIMAL(9,2),(sum(a.total_pages) * 8.00) / 1024.00)AS 'Index_MB',
  max(row_count) AS 'Rows',
  count(*) AS Index_count
FROM sys.indexes i
INNER JOIN
  (SELECT object_ID,Index_ID, sum(rows) AS Row_count 
 FROM sys.partitions GROUP BY object_ID,Index_ID)f
  ON f.object_ID=i.object_ID AND f.index_ID=i.index_ID
INNER JOIN sys.partitions p 
  ON i.object_id = p.object_id
AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a 
  ON p.partition_id = a.container_id
  WHERE objectproperty(i.object_id, 'IsUserTable') = 1
GROUP BY i.object_id;

How do I discover the various ways that tables are using indexes?

In order to discover certain properties about your indexes, it is often better to use property functions as a shortcut.

-- list the names of all tables that have no primary key
SELECT  object_schema_name(object_id)+'.'+object_name(object_id) as No_Primary_key
  FROM sys.tables/* see whether the table has a primary key */  WHERE objectproperty(OBJECT_ID,'TableHasPrimaryKey') = 0;

-- list all tables by name that have no indexes at all
SELECT  object_schema_name(object_id)+'.'+object_name(object_id) as No_Indexes  
  FROM sys.tables /* see whether the table has any index */  WHERE objectproperty(OBJECT_ID,'TableHasIndex') = 0;

-- list all tables by name that have no candidate key (enforced unique set of columns)
SELECT  object_schema_name(object_id)+'.'+object_name(object_id) as No_Candidate_Key
  FROM sys.tables/* if no unique constraint then it isn't relational */  WHERE objectproperty(OBJECT_ID,'TableHasUniqueCnst') = 0
    AND   objectproperty(OBJECT_ID,'TableHasPrimaryKey') = 0;

--list all tables with disabled indexes
SELECT  distinct
  object_schema_name(object_id)+'.'+object_name(object_id) as Has_Disabled_indexes
  FROM sys.indexes /* don't leave these lying around */  WHERE is_disabled=1;

What is, and isn’t an object?

You have probably noticed something odd. Although some of the properties of tables, such as their primary keys, are objects in their own right, this isn’t true of columns, statistics or indexes. Let’s just get this clear, because it isn’t entirely intuitive: In sys.objects, you can find the basic standard details about all the common database components, such as tables, views, synonyms, foreign keys, check constraints, key constraints, default constraints, service queues, triggers and procedures. All these components that I’ve listed have additional attributes that have to be made visible through views that inherit the relevant basic attributes, but also include the data columns that are pertinent to the object. It is better to use these special views because they have all the information you need and the system filters just the type of object that you’re interested in, such as a table. Various objects, such as constraints and triggers, have a parent_ID in the sys.objects table that is non-zero, showing that they are child objects.

The following query shows you a simple way of viewing these child objects and relate them to their parents.

--view the names of all parent objects and their child objects.
SELECT parent.name AS Parents_name, 
       child.name AS Childs_Name, 
       replace(lower(parent.type_desc),'_',' ') AS Parents_type, 
       replace(lower(child.type_desc),'_',' ') AS Childs_type
FROM sys.objects child
  INNER JOIN sys.objects parent
    ON parent.object_ID=child.parent_object_id
WHERE child.parent_object_id<>0
ORDER BY parents_name;

You’ll see that indexes just aren’t objects. In our first query, the object_ID that we returned was the ID of the table on which the index was defined.

The problem here is that the relationships are complex. A constraint can involve several columns and can be enforced by an index too. The index can involve several columns but order is important. Statistics can also involve several columns or could be associated with an index. This means that sys.indexes, sys.stats and sys.columns don’t inherit from sys.objects. The same is true for parameters and types.

How do I simply get to see all the columns, for all the indexes, for every table?

The simplest way of establishing what indexes are on a table and the columns referenced, in order, is to use a query that joins between sys.tables, sys.indexes and sys.index_columns.

--find all the columns for all the indexes for every table
SELECT object_schema_name(t.object_ID)+'.'+t.name AS The_Table, --the name of the table
       i.name AS The_Index,  -- its index
   index_column_id,
   col_name(Ic.Object_Id, Ic.Column_Id) AS The_Column --the column
FROM sys.tables t
INNER JOIN sys.indexes i
ON t.object_ID=i.object_ID
INNER JOIN sys.Index_columns  ic
ON i.Object_ID=ic.Object_ID
AND i.index_ID=ic.index_ID
ORDER BY t.name,i.index_id, index_column_id;

This will be more useful to you once you filter for a particular table. E.g.

  WHERE i.object_id = OBJECT_ID('Production.BillOfMaterials');

What columns are in my indexes, and in what order?

We can do better than the previous query and have a line for each table, listing out all the indexes, and for each index, a list of its columns. This is easier for a busy programmer to scan by eye.

--list all tables with indexes, listing each index (comma-delimited), with a 
--list of their columns in brackets.
SELECT object_schema_name(t.object_ID)+'.'+t.name AS The_Table, --the name of the table
   coalesce(stuff (--get a list of indexes
 (SELECT ', '+i.name
 +' ( '
 +stuff (--get a list of columns
 (SELECT ', ' + col_name(Ic.Object_Id, Ic.Column_Id) 
 FROM  sys.Index_columns  ic
 WHERE ic.Object_ID=i.Object_ID
 AND ic.index_ID=i.index_ID
 ORDER BY index_column_ID ASC
 FOR XML PATH(''), TYPE).value('.', 'varchar(max)'),1,2,'') +' )'
 FROM sys.indexes i 
 WHERE i.object_ID=t.object_ID
     FOR XML PATH(''), TYPE).value('.', 'varchar(max)'),1,2,''),'') AS Indexes
 FROM sys.tables t;

How can I view my XML indexes?

XML indexes are treated as an extension to indexes. I find that the best way of viewing their details is to construct a CREATE statement for them.

SELECT 'CREATE' + case when secondary_type is null then ' PRIMARY' else '' end
 + ' XML INDEX '+coalesce(xi.name,'')+ '  
    ON ' --what table and column is this XML index on?
 + object_schema_name(ic.Object_ID)+'.'+object_name(ic.Object_ID)
 +' ('+col_name(Ic.Object_Id, Ic.Column_Id)+' )  
    '+ coalesce('USING XML INDEX [' + Using.Name + '] FOR ' + Secondary_Type_DeSc
     COLLATE database_default,'')
+'  
'+      replace('WITH ( ' + 
   stuff(
  CASE WHEN xi.Is_Padded <> 0 THEN ', PAD_INDEX  = ON ' ELSE '' END 
  + CASE 
     WHEN xi.Fill_Factor NOT IN (0, 100) 
    THEN ', FILLFACTOR  =' + convert(VARCHAR(3), xi.Fill_Factor) + ''
        ELSE '' END 
  + CASE WHEN xi.Ignore_dUp_Key <> 0 THEN ', IGNORE_DUP_KEY = ON' ELSE '' END 
  + CASE WHEN xi.Allow_Row_Locks = 0 THEN ', ALLOW_ROW_LOCKS  = OFF' ELSE '' END 
  + CASE WHEN xi.Allow_Page_Locks = 0 THEN ', ALLOW_PAGE_LOCKS  = OFF' ELSE ' ' END
   , 1, 1, '')
 + ')', 'WITH ( )', '') --create the list of xml index options
+  coalesce('/* '+convert(varchar(8000),Value)+ '*/','')--and any comment
    AS BuildScript
FROM sys.xml_Indexes xi
      inner join sys.index_columns ic 
   ON ic.Index_Id = xi.Index_Id
   AND ic.Object_Id = xi.Object_Id   
  LEFT OUTER JOIN sys.Indexes [USING]
   ON [USING].Index_Id = xi.UsIng_xml_Index_Id
   AND [USING].Object_Id = xi.Object_Id
  LEFT OUTER JOIN sys.Extended_Properties ep
   ON ic.Object_Id = ep.Major_Id AND ic.Index_Id = Minor_Id AND Class = 7
WHERE object_schema_name(ic.Object_ID) <>'sys' AND ic.index_id>0;

The results of the query above will show you all the essential XML index details as a build script.

CREATE PRIMARY XML INDEX PXML_ProductModel_CatalogDescription  
    ON Production.ProductModel (CatalogDescription )  
     /* Primary XML index.*/ 
CREATE PRIMARY XML INDEX PXML_ProductModel_Instructions  
    ON Production.ProductModel (Instructions )  
     /* Primary XML index.*/ 
CREATE PRIMARY XML INDEX PXML_Store_Demographics  
    ON Sales.Store (Demographics )  
     /* Primary XML index.*/ 
CREATE PRIMARY XML INDEX PXML_Person_AddContact  
    ON Person.Person (AdditionalContactInfo )  
     /* Primary XML index.*/ 
CREATE PRIMARY XML INDEX PXML_Person_Demographics  
    ON Person.Person (Demographics )  
     /* Primary XML index.*/ 
CREATE XML INDEX XMLPATH_Person_Demographics  
    ON Person.Person (Demographics )  
    USING XML INDEX [PXML_Person_Demographics] FOR PATH /* Secondary XML index for path.*/ 
CREATE XML INDEX XMLPROPERTY_Person_Demographics  
    ON Person.Person (Demographics )  
    USING XML INDEX [PXML_Person_Demographics] FOR PROPERTY /* Secondary XML index for property.*/ 
CREATE XML INDEX XMLVALUE_Person_Demographics  
    ON Person.Person (Demographics )  

Are there other types of index stored in the metadata?

There are two more special kinds of indexes, the spatial indexes (information held in sys.spatial_index_tessellations and sys.spatial_indexes)

and the fulltext indexes (information held in fulltext_index_fragments, fulltext_index_catalog_usages, fulltext_index_columns and fulltext_indexes). These types of indexes are rather more advanced topics for this level. We’ll deal with them later in this stairway.

Exploring Index Statistics

For the time being, let’s just end up our exploration of indexes by dealing with the topic of distribution statistics or ‘stats’. Every index has a statistics object attached to it in order to help the query optimiser come up with a decent query plan. To do this, it needs to estimate the ‘cardinality’ of the data to determine how many rows will be returned for any index value, and it uses these ‘stats’ objects to tell it how the data is distributed.

You can see what stats objects are associated with your tables in your database from a query like this:

SELECT object_schema_name(t.Object_ID) + '.'+ t.name AS The_table, 
    stats.name AS Stats_Name, sys.columns.name AS Column_Name
  FROM sys.stats
 INNER JOIN sys.stats_columns
    ON stats.object_id = stats_columns.object_id
   AND stats.stats_id = stats_columns.stats_id
 INNER JOIN sys.columns
    ON stats_columns.object_id = columns.object_id
   AND stats_columns.column_id = columns.column_id
 INNER JOIN sys.tables t
    ON stats.object_id = t.object_id;

You’ll see that, when they are associated with an index, statistics inherit the name of the index, and use the same columns as that index.

Checking for duplicate statistics

You can quickly see if you have more than one statistic for the same column or set of columns just by comparing the list of column numbers that each statistic is associated with.

SELECT object_schema_name(Object_ID)+'.'+object_name(Object_ID) as tableName,
       count(*) as Similar, ColumnList as TheColumn, 
   max(name)+', '+min(name) as duplicates
FROM 
   (SELECT Object_ID, name,   
 stuff (--get a list of columns
 (SELECT ', ' + col_name(sc.Object_Id, sc.Column_Id)
 FROM  sys.stats_columns  sc
 WHERE sc.Object_ID=s.Object_ID
 AND sc.stats_ID=s.stats_ID
 ORDER BY stats_column_ID ASC
 FOR XML PATH(''), TYPE).value('.', 'varchar(max)'),1,2,'') AS ColumnList
   FROM sys.stats s)f
GROUP BY Object_ID,ColumnList 
HAVING count(*) >1;

This shows you the tables that have duplicate statistics objects on them, In this case the sales.customer table has two similar statistics objects on the AccountNumber column

Summary

There is a lot of information available on indexes that is useful to the database programmer. Once the number of tables in a database gets large, it is very easy to let a table slip in that has something wrong, such as inadvertently having no clustered index or primary key, or that has duplicatedindexes or unnecessary statistics. Once you are familiar with the basic ways of querying the catalog views, it becomes quicker to run a query to get information than to use the SQL Server Management Studio.

This article is part of the parent stairway Stairway to Exploring Database Metadata

Rate

5 (3)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (3)

You rated this post out of 5. Change rating