Blog Post

When Should You Use Identity as a Clustered Index Key?

,

A few days ago, while delivering my session, The Data-Loading Performance Presentation, I was asked when, performance-wise, Identity should be used as the clustered index key, and when it shouldn’t.

So first, starting SQL Server 2012, a Sequence is actually a better choice than identity. It’s more flexible, more adjustable, and since it’s external to the table, it allows easier migrations.

After we got that out the way, let’s dive to performance.

Traditionally, we want our clustered index key to be:

Narrow: So that the clustered index and the non-clustered indexes which point to it will be as slim as possible.

Static: So that we don’t have to update the clustered and non-clustered indexes, generate fragmentation by doing it and incur locks and potentially deadlocks.

Ever-increasing: So that we don’t generate page splits and fragmentation while inserting data to the table and thus hurt both insert and query performance.

 

For the vast majority of systems (something like 95% of them), this will be a very good choice.

The problem starts when MANY sessions insert data to the table in parallel.

Looking at the structure of an index, when working with an ever-increasing key, all of the rows will go to the same page, which is the last page of the index.

Tail Insert Problem

When inserting data into any page, a session has to acquire a Latch on that page. A Latch is synchronization object for in memory structures like data pages that are in the buffer pool. When inserting data into the page, each session has to acquire an exclusive latch so that no other session can change the structure of the page while the session is inserting data. This means that only one session can insert data into the page at any given time, and if you have many sessions that write to the table at the same time, each one of them will have to wait a considerable amount of time until its turn reaches to insert data. This problem is called the Tail Insert problem.

How to know if you have a problem:

  1. While inserts are running against the suspicious table, query sys.dm_os_waiting_tasks
  2. Check whether you have many sessions waiting with a PAGELATCH wait on the same page

    PAGELATCH Waits

  3. Enable trace flag 3604 using the syntax:
DBCC TRACEON (3604)

4. Run DBCC PAGE in order to see the contents of the relevant page. In our case:

DBCC PAGE (8, 1, 10078, -1)

5. In the results pane, look for ObjectId:

DBCC PAGE ObjectId

6. Then take that ObjectId and find the associated table using the Object_Name function (in our case, 1013578649):

SELECT OBJECT_NAME (1013578649)

7. Is that your table? You’re most likely suffering from the Tail Insert problem.

Update:

Kenneth Fisher (Blog | Twitter) has suggested a better way for replacing steps 3 to 6 and finding the associated table faster.

We will use the resource description details we grabbed above with the sys.dm_database_page_allocations function. 8 is the database id, 1 is the file id, and 10078 is the latched page id. This brings us to the following query:

SELECT object_name(object_id,database_id) 
FROM sys.dm_db_database_page_allocations(8,null,null,null,null)
WHERE allocated_page_file_id = 1
  AND allocated_page_page_id = 10078

 

So when should I use Identity?

If you don’t suffer from the Tail Insert problem, you’re most likely good to go with an ever-increasing key like Identity. If you do suffer from it, you have to start thinking about other solutions.

Solutions:

When you have this problem, it’s time to leave Identity (or Sequence) behind and work with other methods. There a few main options:

  1. Load to a Heap, which works better with parallel loading
  2. Work with In-Memory OLTP, which uses Lock-Free data structures in order to mitigate latches
  3. Work with a non-sequential key like GUID
  4. Work with Hash Partitioning, which separates the Clustered Index to a few separate physical index trees
  5. Pad the table so every page holds only a single row (if your table is small)
  6. Use a business key that spreads all over the index and not in a specific spot in it
  7. Implement a “reverse index”

 

In the next post, I go over the solutions in detail.

The post When Should You Use Identity as a Clustered Index Key? appeared first on Madeira Data Solutions.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating