When Should You Use Identity as a Clustered Index Key?

A few days ago, while delivering my session, The Data-Loading Performance Presentation, I was asked when, performance-wise, Identity should be used as the clustered index key, and when it shouldn’t.

So first, starting SQL Server 2012, a Sequence is actually a better choice than identity. It’s more flexible, more adjustable, and since it’s external to the table, it allows easier migrations.

After we got that out the way, let’s dive to performance.

Traditionally, we want our clustered index key to be:

Narrow: So that the clustered index and the non-clustered indexes which point to it will be as slim as possible.

Static: So that we don’t have to update the clustered and non-clustered indexes, generate fragmentation by doing it and incur locks and potentially deadlocks.

Ever-increasing: So that we don’t generate page splits and fragmentation while inserting data to the table and thus hurt both insert and query performance.

For the vast majority of systems (something like 95% of them), this will be a very good choice.

The problem starts when MANY sessions insert data to the table in parallel.

Looking at the structure of an index, when working with an ever-increasing key, all of the rows will go to the same page, which is the last page of the index.

Tail Insert Problem

When inserting data into any page, a session has to acquire a Latch on that page. A Latch is synchronization object for in memory structures like data pages that are in the buffer pool. When inserting data into the page, each session has to acquire an exclusive latch so that no other session can change the structure of the page while the session is inserting data. This means that only one session can insert data into the page at any given time, and if you have many sessions that write to the table at the same time, each one of them will have to wait a considerable amount of time until its turn reaches to insert data. This problem is called the Tail Insert problem.

How to know if you have a problem:

While inserts are running against the suspicious table, query sys.dm_os_waiting_tasks
Check whether you have many sessions waiting with a PAGELATCH wait on the same page
Enable trace flag 3604 using the syntax:

DBCC TRACEON (3604)

4. Run DBCC PAGE in order to see the contents of the relevant page. In our case:

DBCC PAGE (8, 1, 10078, -1)

5. In the results pane, look for ObjectId:

DBCC PAGE ObjectId

6. Then take that ObjectId and find the associated table using the Object_Name function (in our case, 1013578649):

SELECT OBJECT_NAME (1013578649)

7. Is that your table? You’re most likely suffering from the Tail Insert problem.

Update:

Kenneth Fisher (Blog | Twitter) has suggested a better way for replacing steps 3 to 6 and finding the associated table faster.

We will use the resource description details we grabbed above with the sys.dm_database_page_allocations function. 8 is the database id, 1 is the file id, and 10078 is the latched page id. This brings us to the following query:

SELECT object_name(object_id,database_id) 
FROM sys.dm_db_database_page_allocations(8,null,null,null,null)
WHERE allocated_page_file_id = 1
  AND allocated_page_page_id = 10078

So when should I use Identity?

If you don’t suffer from the Tail Insert problem, you’re most likely good to go with an ever-increasing key like Identity. If you do suffer from it, you have to start thinking about other solutions.

Solutions:

When you have this problem, it’s time to leave Identity (or Sequence) behind and work with other methods. There a few main options:

Load to a Heap, which works better with parallel loading
Work with In-Memory OLTP, which uses Lock-Free data structures in order to mitigate latches
Work with a non-sequential key like GUID
Work with Hash Partitioning, which separates the Clustered Index to a few separate physical index trees
Pad the table so every page holds only a single row (if your table is small)
Use a business key that spreads all over the index and not in a specific spot in it
Implement a “reverse index”

In the next post, I go over the solutions in detail.

The post When Should You Use Identity as a Clustered Index Key? appeared first on Madeira Data Solutions.

When Should You Use Identity as a Clustered Index Key?

How to know if you have a problem:

So when should I use Identity?

Solutions:

Rate

Share

Share

Rate

When Should You Use Identity as a Clustered Index Key?

How to know if you have a problem:

So when should I use Identity?

Solutions:

Rate

Share

Recent comments

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts

Cookies on SQLServerCentral