Blog Post

Unique and non-unique SQL Server indexes on a heap table

,

In the upcoming weblog postings I want to work out the differences between unique

and non-unique indexes in SQL Server. I assume that you already know the concepts

about clustered- and non clustered indexes and how they are used in SQL Server.

In the past I've done a lot of trainings and consulting regarding SQL Server performance

tuning and it seems that some people doesn't know the differences and implications

between unique and non-unique indexes. And as you will see in the upcoming postings

there are really big differences how SQL Server stores those two variants that impact

the size and the efficiency of your indexes.

Let's start today with unique and non unique non clustered indexes on a table without

a clustered index, a so-called heap table in SQL Server. The following

listing shows how to create our test table and populate it with 80.000 records. Each

record needs 400 bytes, therefore SQL Server can put 20 records on each data page.

This means that our heap table contains 4.000 data pages and 1 IAM page.

-- Create a table

with 393 length + 7 bytes overhead = 400 bytes

-- Therefore

20 records can be stored on one page (8.096 / 400) = 20,24

CREATE TABLE CustomersHeap

(

CustomerID INT NOT NULL,

CustomerName CHAR(100) NOT NULL,

CustomerAddress CHAR(100) NOT NULL,

Comments CHAR(189) NOT NULL

)

GO

-- Insert 80.000

records

DECLARE @i INT = 1

WHILE (@i <= 80000)

BEGIN

INSERT INTO CustomersHeap VALUES

(

@i,

'CustomerName' + CAST(@i AS CHAR),

'CustomerAddress' + CAST(@i AS CHAR),

'Comments' + CAST(@i AS CHAR)

)

SET @i += 1

END

GO

-- Retrieve physical

information about the heap table

SELECT * FROM sys.dm_db_index_physical_stats

(

DB_ID('NonClusteredIndexStructureHeap'),

OBJECT_ID('CustomersHeap'),

NULL,

NULL,

'DETAILED'

)

GO

After the creation of the heap table and the data loading, you can now define a unique

and non-unique non-clustered index on the column CustomerID of our

heap table. We will define both indexes on the same column so that we can analyze

the differences between unique- and non-unique non-clustered indexes.

-- Create a unique

non clustered index

CREATE UNIQUE NONCLUSTERED INDEX IDX_UniqueNCI_CustomerID

ON CustomersHeap(CustomerID)

GO

-- Create a non-unique

non clustered index

CREATE NONCLUSTERED INDEX IDX_NonUniqueNCI_CustomerID

ON CustomersHeap(CustomerID)

GO

If you want to define a unique non-clustered index on a column that doesn't contain

unique data, you will get back an error message from SQL Server. Important to know

is that SQL Server creates a non-unique non-clustered index if you don't specify the UNIQUE property

when creating a non-clustered index. So by default you will always get a non-unique

non-clustered index!

After the creation of both indexes you can analyze their size, their index depth,

their size etc. with the DMV sys.dm_db_index_physical_stats. You

can to pass in as the 3rd parameter the index-id. The IDs of all non-clustered indexes

starts at 2, therefore the first non-clustered index gets the ID 2 and the second

one the ID 3.

-- Retrieve physical

information about the unique non-clustered index

SELECT * FROM sys.dm_db_index_physical_stats

(

DB_ID('NonClusteredIndexStructureHeap'),

OBJECT_ID('CustomersHeap'),

2,

NULL,

'DETAILED'

)

GO

-- Retrieve physical

information about the non-unique non-clustered index

SELECT * FROM sys.dm_db_index_physical_stats

(

DB_ID('NonClusteredIndexStructureHeap'),

OBJECT_ID('CustomersHeap'),

3,

NULL,

'DETAILED'

)

GO

As you can see from both outputs, the index root page of the unique non-clustered

index is occupied of around 24%, where the index root page of the non-unique non-clustered

index is occupied of around 39%, so there must be a difference in the storage format

of unique/non-unique non-clustered indexes on a heap table! In the next step we create

a simple helper table that stores the output of the DBCC IND command.

The structure of this helper table is directly taken from the excellent book SQL

Server 2008 Internals.

-- Create a helper

table

CREATE TABLE sp_table_pages

(

PageFID TINYINT,

PagePID INT,

IAMFID TINYINT,

IAMPID INT,

ObjectID INT,

IndexID TINYINT,

PartitionNumber TINYINT,

PartitionID BIGINT,

iam_chain_type VARCHAR(30),

PageType TINYINT,

IndexLevel TINYINT,

NextPageFID TINYINT,

NextPagePID INT,

PrevPageFID TINYINT,

PrevPagePID INT,

PRIMARY

KEY (PageFID, PagePID)

)

GO

After the creation of this helper table we can dump out all pages that are belonging

to our non-clustered indexes to this helper table with the following two calls to

DBCC INC in combination with the INSERT INTO statement:

-- Write everything

in a table for further analysis

INSERT INTO sp_table_pages

EXEC('DBCC

IND(NonClusteredIndexStructureHeap, CustomersHeap, 2)')

GO

-- Write everything

in a table for further analysis

INSERT INTO sp_table_pages

EXEC('DBCC

IND(NonClusteredIndexStructureHeap, CustomersHeap, 3)')

GO

Now we can start analyzing our non-clustered indexes by using the undocumented DBCC

PAGE command. You can find more information about this great command on Paul

Randal's weblog.

To get some information back from DBCC PAGE you have to enable the

flag 3604 of DBCC:

DBCC TRACEON(3604)

GO

Let's dump out the index root page of our unique non-clustered index by the following

command:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4192, 3)

GO

This will result in the following result in SQL Server Management Studio:

image

As you can see from this screenshot SQL Server stores the child page of the B-tree

where the minimum key of the non-clustered index is located. The child page 4161 contains

for example the record with the minimum key of 540 up to the maximum key of 1078.

When you dump out the index root page with the dump option 1 you get the byte by byte

representation of all index records on the index root page:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4192, 1)

GO

SQL Server needs here 11 bytes for storing an index row. These 11 bytes are storing

the following information:

  • 1 byte: Status Bits

  • 4 bytes: Customer ID, like 540

  • 4 bytes: child PageID, like 4161

  • 2 bytes: FileID, like 1

As you can see it's up to the length of the non-clustered key how long an index row

is. This also means that SQL Server is able to store more index rows on an index page

if you choose a smaller non-clustered key. If you choose for example a CHAR(100) as

a non-clustered index key, then SQL Server needs more index pages for your non-clustered

index, which is not so efficient as using a smaller index key. The T-SQL script enclosed

to this posting shows you how you can decode those bytes from the hexadecimal representation.

Finally you can dump out the child page 4161, which is located on the leaf-level of

the non-clustered index.

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4161, 3)

GO

image

As you can see from the figure, SQL Server stores for each index key on which data

page and on which slot the corresponding record is located. Because we have not defined

a clustered index on our table, SQL Server uses here the RID (Row Identifier) to point

to the correct record on the data page. Index pages on the leaf-level on a heap table

are different from leaf-level index pages defined on a clustered table (a table that

contains a clustered index).When you dump out the leaf-level index page of the non-clustered

index you can see that SQL Server needs 13 bytes per index row:

  • 1 byte: Status Bits

  • 4 bytes: CustomerID, like 540

  • 4 bytes: PageID, like 178,

  • 2 bytes: FileID, like 1

  • 2 bytes: Slot number, like 19

Finally with this information in your hand, it is very easy to locate the correct

record on the data page, because you know the PageID, FileID, and also the slot number

where the record on the data page is located. Easy, isn't it?

Let's move on now to non-unique non-clustered indexes. Earlier we have already created

such an index, which gets the index-id of 3 from SQL Server, because it's the second

non-clustered index we have defined. In my case the index root page of the non-unique

non-clustered index is located on page 4264, therefore I dump it out with the following

command:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4264, 3)

GO

image

But wait! Now the result from DBCC PAGE on the root index page on

a non-unique non-clustered index is different! As you can see SQL Server returns here

an additional column named "HEAP RID (key)". The value in this column

is used to make your non-unique non-clustered index unique. The HEAP RID column

uses 8 additional bytes in your index row, which encodes the following information

that are granted to be unique on a heap table:

  • 4 bytes: PageID, like 178

  • 2 bytes: FileID, like 1

  • 2 bytes: Slot number, like 19

The overead of a non-unique non-clustered index on a heap table costs you 8 additional

bytes per index row - on all index levels, expect the leaf-level, because SQL Server

stores here always the HEAP RID as you have seen previously! So please keep this 8

bytes of additional index record overhead in mind, when you create non-clustered indexed

that are NOT unique! And as I have said earlier, they are NOT unique by default!!!

In this example your non-unique non-clustered index is about 2 times bigger than the

unique non-clustered index, because the unique index needs 11 bytes and the non-unique

index needs 19 bytes (overhead of 8 bytes). When you look back to the output of the

DMV sys.dm_db_index_physical_stats you can see that the index root

page of the unique non-clustered index has a page space usage of around 24% where

the index root page of the non-unique non-clustered index has a page space usage of

around 39%. This will make a big difference on large non-clustered indexes!

image

So if you are just defining non-clustered indexes with

CREATE NONCLUSTERED INDEX

...

without thinking about the uniqueness of your data, you are wasting a lot of storage

in your non-clustered indexes which also impacts the performance of your non-clustered

indexes and their ongoing maintenance.

You can download the T-SQL script for this posting here.

In the next installment of this series we will have a look into the differences of

unique clustered indexes and unique/non unique non-clustered indexes. Stay tuned 🙂

-Klaus

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating