Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Klaus Aschenbrenner

Klaus Aschenbrenner provides independent SQL Server Consulting Services across Europe and the US. Klaus works with the .NET Framework and especially with the SQL Server 2005/2008 from the very beginnings. In the years 2004 - 2005 Klaus was entitled with the MVP award from Microsoft for his tremendous support in the .NET Community. Klaus has also written the book Pro SQL Server 2008 Service Broker which was published by Apress in the Summer of 2008. Further information about Klaus you can find on his homepage at http://www.SQLpassion.at. He also twitters at http://twitter.com/Aschenbrenner.

Unique/Non-Unique Clustered Indexes

In the last blog post I have talked about unique/non-unique clustered indexes on a heap table. A table without a clustered index is called a heap table in SQL Server. When you define a clustered index on such a table, the table data gets structured and is therefore referred as clustered table. In this blog post I want to talk about the differences in unique and non-unique clustered indexes, and what are the storage impacts between those 2 types of clustered indexes.

As a prerequisite I assume that you have a basic understanding of clustered indexes, and that you know the difference between heap and clustered tables, and how your data pages are structured when a clustered index is defined on a table.

Let's start by looking on a unique clustered index. With SQL Server you have several possibilities to define a unique clustered index. The first way – the easy one – is to define a PRIMARY KEY constraint on a column. SQL Server enforces this PRIMARY KEY constraint through the creation of a unique clustered index on that table and that column. The another option is to create a unique clustered index through the CREATE CLUSTERED INDEX statement – but when you don't specify the UNIQUE property, SQL Server will create a non-unique clustered index by default for you! The following code fragment creates the Customers table that you already know from the previous blog posting, but this time we create a PRIMARY KEY constraint on the column CustomerID. Therefore SQL Server creates a unique clustered index on that table and sorts the data pages in the leaf level according the values in the column CustomerID.

-- Create a table with 393 length + 7 bytes overhead = 400 bytes
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
(
CustomerID INT NOT NULL PRIMARY KEY IDENTITY(1, 1),
CustomerName CHAR(100) NOT NULL,
CustomerAddress CHAR(100) NOT NULL,
Comments CHAR(189) NOT NULL
)
GO

-- Insert 80.000 records
DECLARE @i INT = 1
WHILE (@i <= 80000)
BEGIN
INSERT INTO Customers VALUES
(
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)

SET @i += 1
END
GO

After we have identified the index root page (through the use of the DBCC IND command), we can dump out that page with the DBCC PAGE command. In my case the index root page is 775:

DBCC PAGE(UniqueClusteredIndexStructure, 1, 775, 3)
GO

As you can see from the following figure each index record contains the clustered key, in this case the value of the column CustomerID.

image

When you examine the byte by byte representation of a clustered index record, you can see that SQL Server uses here the following bytes:

  • 1 byte: Status Bits
  • n bytes: Clustered Key – in this case 4 bytes
  • 4 bytes: PageID
  • 2 bytes: FileID

As you can see the length of the clustered key has a direct relationship of the length of an index record. This mean as smaller your clustered key is, the more index record can be put onto an index page, and therefore your clustered index will be much more compact and will perform faster and are easier to maintain. When you walk down your clustered index you will see that all intermediate levels have the same storage format as described above. There are no differences on each level, expect the index leaf level, because this level contains your actual logically ordered data pages.

Let's have now a look onto non-unique clustered indexes in SQL Server and how they differ from unique clustered indexes. To demonstrate this kind of indexes, I have just recreated the Customers table and created a non-unique clustered index on that table through the CREATE CLUSTERED INDEX statement:

-- Create a table with 393 length + 7 bytes overhead = 400 bytes
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
(
CustomerID INT NOT NULL,
CustomerName CHAR(100) NOT NULL,
CustomerAddress CHAR(100) NOT NULL,
Comments CHAR(181) NOT NULL
)
GO

-- Create a non unique clustered index
CREATE CLUSTERED INDEX idx_Customers_CustomerID
ON Customers(CustomerID)
GO

Finally I have inserted 80.000 records, where the column CustomerID (the clustered key) is not unique anymore:

-- Insert 80.000 records
DECLARE @i INT = 1
WHILE (@i <= 20000)
BEGIN
INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)

INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)

INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)

INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)
>


SET @i += 1
END
GO

When you now dump out the root index page of the non-unique clustered index, you get the following result:

image

As you can see, SQL Server returns here an additional column named UNIQUIFIER (key). This column is used by SQL Server to make a non-unique clustered key unique. Behind the scenes it is a 4 byte long integer value starting at 0. E.g. when you have 2 customers with the ID 1380 the first record gets the uniquifier value 0 and the second one gets the uniquifier value of 1. But SQL Server only stores the uniquifier in the navigation structure of an index (all levels above the leaf level), when the uniquifier is not equal to 0. SQL Server only includes uniquifier values of 0 in the navigation structure of a non-unique clustered index, which means that the navigation structure will never store the uniquifier physically. The only place where the uniquifier is stored in a non-unique clustered index is on the data pages, where the actual data records are stored. The following figure shows a data page dump of our clustered index, where you can also see the stored uniquifier.

image

So the only difference between a unique and non-unique clustered index is on the data pages, because when using a non-unique clustered index, SQL Server will use the 4 byte long uniquifier to make them unique, which is a small storage overhead that you have to keep in mind, when working with non-unique clustered indexes. You can download the T-SQL script for this posting here.

In the next posting we will work out the differences between unique/non-unique non-clustered indexes defined on unique clustered indexes. Stay tuned :-)

-Klaus

Comments

Posted by Mark S. Rasmussen on 29 June 2011

Sorry for resurrecting a year old post but the Google-fu in this one was strong :)

"SQL Server only includes uniquifier values of 0 in the navigation structure of a non-unique clustered index, which means that the navigation structure will never store the uniquifier physically."

I think there's a double negative in here :) SQL Server doesn't include values of 0 in neither the data pages nor the b-tree, it only includes non-zero values. Also the sentence seems self contradicting - only includes values of 0, but never stores the value physically - I'm assuming you meant to write that it never includes the value 0 physically.

"The only place where the uniquifier is stored in a non-unique clustered index is on the data pages, where the actual data records are stored. The following figure shows a data page dump of our clustered index, where you can also see the stored uniquifier."

This is also contradicting with the previous section. The uniquifier is not only stored at the data pages, but also in the b-tree when non zero. The screenshot actually shows exactly that - the zero valued uniquifier has a physical length of zero, thus it takes up no bytes on the record - it's value is implicitly generated due to metadata that tells there might be a uniquifier.

Posted by Mark S. Rasmussen on 29 June 2011

Didn't mean to be anonymous - was sure I'd written my name in there. If this post comes out as 'mark' as well, there's a problem somewhere :)

- Mark S. Rasmussen

Posted by AmarPo on 4 July 2012

Mark,

You are absolutely right about the convoluted description of the uniquifier (double negative)! I am sure Klaus didn't it to be that way but it sure is confusing the way he wrote it.

Thank you for adding your comments to help other readers like myself get past the confusion.

Amar

Posted by AmarPo on 4 July 2012

Mark,

You are absolutely right about the convoluted description of the uniquifier (double negative)! I can only assume that Klaus didn't mean it to be that way but it sure is confusing the way he wrote it.

Thank you for adding your comments to help other readers like myself get past the confusion.

Amar

Posted by tribhuwanupc on 9 July 2013

very-2 good artile

Leave a Comment

Please register or log in to leave a comment.