Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQLStudies

My name is Kenneth Fisher and I am Senior DBA for a large (multi-national) insurance company. I have been working with databases for over 20 years starting with Clarion and Foxpro. I’ve been working with SQL Server for 12 years but have only really started “studying” the subject for the last 3. I don’t have any real "specialities" but I enjoy trouble shooting and teaching. Thus far I’ve earned by MCITP Database Administrator 2008, MCTS Database Administrator 2005, and MCTS Database Developer 2008. I’m currently studying for my MCITP Database Developer 2008 and should start in on the 2012 exams next year. My blog is at www.sqlstudies.com.

The clustered index columns are in all of the non clustered indexes.

Did you know that whatever columns you pick as your clustered index will be included in any non clustered indexes on the same table? But don’t take my word for it. Let’s take a look!

First things first I’m going to use some AdventureWorks2012 tables to make a test table.

-- Create a convinent composite table 
SELECT Pers.BusinessEntityID, Pers.Title, Pers.FirstName, Pers.MiddleName, 
	Pers.LastName,Addr.AddressLine1, Addr.AddressLine2, Addr.City, 
	Addr.StateProvinceID, Addr.PostalCode, Addr.SpatialLocation
	INTO People1
FROM AdventureWorks2012.Person.Address Addr
JOIN AdventureWorks2012.Person.BusinessEntityAddress BEA
	ON Addr.AddressID = BEA.AddressID
JOIN AdventureWorks2012.Person.Person Pers
	ON BEA.BusinessEntityID = Pers.BusinessEntityID
GO
-- Add indexes including a non-unique (important later) clustered index
CREATE CLUSTERED INDEX ix1_People1 ON People1(AddressLine1, AddressLine2, City, StateProvinceID, PostalCode)
CREATE INDEX ix2_People1 ON People1(BusinessEntityID)
CREATE INDEX ix3_People1 ON People1(LastName, FirstName, MiddleName)
GO

The clustered index (CI) is on the 5 address columns and there are non-clustered indexes (NCI) on the BusinessEntityID and the 3 name columns. We can look at the structure of a page from one of the indexes by using sys.dm_db_database_page_allocations and DBCC PAGE (links are in the code below).

-- Get the page id for NCI ix2_People1
-- Info on sys.dm_db_database_page_allocations: 
--		http://www.jasonstrate.com/2013/04/a-replacement-for-dbcc-ind-in-sql-server-2012/
SELECT indexes.name, indexes.index_id, indexes.type_desc, 
	pages.allocated_page_file_id, pages.allocated_page_page_id, pages.is_iam_page
FROM sys.indexes
JOIN sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('People1'), NULL, NULL, NULL) pages
	ON indexes.object_id = pages.object_id
	AND indexes.index_id = pages.index_id
WHERE indexes.name = 'ix2_People1'

CI_NCI_1

In order to use DBCC PAGE I need the file id and the page id. I’m using page 288 instead of 638 because page 638 is the IAM page. All of my pages are in file 1 (I only have the one data file).

-- View index page
-- Info on DBCC PAGE:
--		http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-using-dbcc-page-and-dbcc-ind-to-find-out-if-page-splits-ever-roll-back/
-- Turn traceflag 3604 on so we can see the results
DBCC TRACEON (3604);
-- Take a look at the contents of one of the index pages
DECLARE @DBID int
SET @DBID = DB_ID()
DBCC PAGE(@DBID, 1, 288, 3)

CI_NCI_2

You can see the first 9 (of 217) rows in the page in the image above. You can see that while the BusinessEntityID is the only column I indexed on (and the only one that will show up if you look at sp_helpindex or anything similar) there are actually 6 additional columns in the index. The 5 columns from the CI and the UNIQUIFIER column. In case you are interested the UNIQUIFIER is added any time you have a CI that is non-unique (which is why I deliberately made this one non-unique).

I’m going to stop here and point out that I created a clustered index not a primary key. A primary key is by default a unique clustered index but it doesn’t have to be. It has to be unique but not clustered. Because it has to be unique if you create a primary key for your CI then you won’t see the uniquifier column when you look at the page information.

I deliberately created a long clustered index on columns that probably shouldn’t be used as the clustered index to demonstrate a couple of points. First your CI choice is going to affect the size of your indexes.

To start I’m going to create another table (exactly the same) with a different, smaller CI.

-- Create a convinent composite table 
SELECT Pers.BusinessEntityID, Pers.Title, Pers.FirstName, Pers.MiddleName, 
	Pers.LastName,Addr.AddressLine1, Addr.AddressLine2, Addr.City, 
	Addr.StateProvinceID, Addr.PostalCode, Addr.SpatialLocation
	INTO People2
FROM AdventureWorks2012.Person.Address Addr
JOIN AdventureWorks2012.Person.BusinessEntityAddress BEA
	ON Addr.AddressID = BEA.AddressID
JOIN AdventureWorks2012.Person.Person Pers
	ON BEA.BusinessEntityID = Pers.BusinessEntityID
GO
-- Add indexes including a non-unique (important later) clustered index
CREATE CLUSTERED INDEX ix1_People2 ON People2(BusinessEntityID)
CREATE INDEX ix2_People2 ON People2(AddressLine1, AddressLine2, City, StateProvinceID, PostalCode)
CREATE INDEX ix3_People2 ON People2(LastName, FirstName, MiddleName)
GO

I’m now going to use a modified version of a query I got off of Basit’s(b/t) blog.

Note that ix1 in both cases should be about the same. The CI IS the table. It contains all of the data for the table so there shouldn’t be any significant change in size. The second index (ix2) is also going to be about the same size. I just swapped the two sets of columns so both ix2s are going to contain BusinessEntityID, AddressLine1, AddressLine2, City, StateProvinceID, and PostalCode. The third index (ix3) on the other hand should show a fairly significant difference. The table People1 will have the columns LastName, FirstName, and MiddleName & AddressLine1, AddressLine2, City, StateProvinceID, and PostalCode while the table People2 will have columns LastName, FirstName, and MiddleName & BusinessEntityID.

SELECT OBJECT_NAME(i.object_id) AS TableName, i.[name] AS IndexName
    ,SUM(s.[used_page_count]) * 8 AS IndexSizeKB
FROM sys.dm_db_partition_stats AS s
INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id]
    AND s.[index_id] = i.[index_id]
WHERE OBJECT_NAME(i.object_id) IN ('People1','People2')
GROUP BY OBJECT_NAME(i.object_id), i.[name]
ORDER BY OBJECT_NAME(i.object_id), i.[name]
GO

CI_NCI_3

So exactly what I expected. ix1 & ix2 are about the same size in both tables. However ix3 for table People1 is about three times the size of ix3 on People2. Not a big deal with a small table with only 3 indexes. You get to a mm row table with 5 or 6 NCIs it could get rather significant.

Now on an up note with a larger clustered index you do get increased coverage.

SELECT FirstName, LastName, AddressLine1, AddressLine2,
	City, StateProvinceID, PostalCode
FROM People1
WHERE LastName LIKE 'A%'
  AND AddressLine2 IS NOT NULL

SELECT FirstName, LastName, AddressLine1, AddressLine2,
	City, StateProvinceID, PostalCode
FROM People2
WHERE LastName LIKE 'A%'
  AND AddressLine2 IS NOT NULL

CI_NCI_4

I realize it’s a bit of a goofy query but it does demonstrate the point. In People1 where the CI contains the address information the optimizer was able to use ix3 as a covering index. In People2 where the CI is the BusinessEntityID the optimizer had to use both ix2 and ix3 and ended up taking 95% of the combined time of the two queries. Since the columns in the CI are in all indexes they can always be used when determining if the index covers a query.

Now in my opinion these are not primary reasons for picking out a clustered index. They are more consequences of a CI choice. Important consequences admittedly. Hopefully though, this does point out some of the reasons why picking out the CI for a table is at once very important and very tricky.


Filed under: Index, Microsoft SQL Server, SQLServerPedia Syndication Tagged: index, microsoft sql server

Comments

Leave a comment on the original post [sqlstudies.com, opens in a new window]

Loading comments...