Introduction
High-concurrency systems always look impressive on paper. You throw in dozens of CPU cores, crank up memory, design a schema with lightweight inserts, and proudly tell yourself, “This thing is going to fly.” And to be fair, under low load it probably does. A single session inserting rows into a simple table barely makes SQL Server break a sweat.
But the story changes once you start pushing that system with hundreds of parallel inserts. Suddenly, all that horsepower doesn’t matter anymore, because every thread is fighting over the same tiny spot in memory: the last page of a clustered index. This is the classic last-page insert contention problem. It creeps in whenever your clustered index key is sequential — the typical IDENTITY, DATETIME, or NEWSEQUENTIALID() setup. Each new row naturally gravitates to the end of the B-tree. That sounds orderly and efficient, but under concurrency it’s a trap. Instead of spreading inserts across multiple pages, they all dogpile onto one hot page.
When that happens, SQL Server has no choice but to serialize access with latches. You’ll see PAGELATCH_EX waits climbing, worker threads spinning, and throughput collapsing. What once looked like “minimal locking” in theory becomes a traffic jam where every insert has to wait its turn just to grab an in-memory latch. It feels almost unfair. You aren’t even blocked by logical locks — you’re blocked by something much lower level, a memory latch guarding page structure. The result is simple: your scalable design gets throttled by the physics of a single hotspot in a B-tree.
To make this concrete, imagine a basic table for orders:
INSERT INTO SalesOrders(CustomerID, Amount) VALUES (123, 99.99);
Run that across a handful of concurrent sessions and it’s fine. Scale it out to dozens or hundreds of threads, and suddenly you’re staring at latch contention charts instead of clean performance graphs.
Microsoft knew this was a real-world problem, so in SQL Server 2019 they slipped in a fix, called OPTIMIZE_FOR_SEQUENTIAL_KEY. The best part? It doesn’t require schema redesign, application rewrites, or fancy tricks. With a single ALTER INDEX statement, you can tell SQL Server to activate an internal backoff strategy — like a smart traffic cop that spaces out the cars instead of letting them all pile into the same junction.
This article is about understanding that problem, showing you what happens when you test it under stress, and seeing how OPTIMIZE_FOR_SEQUENTIAL_KEY quietly smooths things out.
The Problem: Last Page Insert Contention
Let’s say you have a simple sales table:
CREATE TABLE dbo.SalesOrders ( OrderID INT IDENTITY(1,1) PRIMARY KEY, OrderDate DATETIME DEFAULT GETDATE(), CustomerName VARCHAR(100), Amount DECIMAL(10,2) );
Under a single-threaded workload, inserts here are fine. But with many sessions inserting rows, every new record goes to the last page of the clustered index. Multiple sessions all want an exclusive latch on that page → SQL Server serializes them → PAGELATCH_EX waits dominate.The ironic part? You’re not fighting over logical locks — you’re fighting over a memory latch that guards physical page structure. Traditional workarounds like reversing keys, changing schema, or sharding are rarely practical in production.
INSERT INTO dbo.SalesOrders (CustomerName, Amount) VALUES ('TestUser', 99.99);
Since OrderID is increasing, all rows go to the last page of the clustered index. The result?
- Multiple sessions try to get an exclusive latch on that last page.
- SQL Server serializes those requests → PAGELATCH_EX wait.
- Throughput drops, CPU remains idle, and inserts crawl.
The ironic part? You’re not fighting over logical locks — you’re fighting over a memory latch, which guards physical page structure. And traditional fixes — changing schema, sharding, or reversing keys — often aren’t practical in real-world production systems.
What happens under the hood with OPTIMIZE_FOR_SEQUENTIAL_KEY
When multiple sessions try to insert into the same last page, they all rush at once to grab an exclusive latch (PAGELATCH_EX). Without OFSK (Optimize for Sequential Key), SQL Server basically says: “Everyone line up, first one gets the latch, the rest spin and wait.”
So 50 sessions will slam the latch at once. The unlucky ones burn CPU spinning or just wait, adding latency. That’s why you see signal wait times pile up in sys.dm_os_wait_stats.
With OPTIMIZE_FOR_SEQUENTIAL_KEY = ON, SQL Server becomes smarter. Instead of letting every thread dogpile, it introduces a backoff algorithm. Internally, it watches contention patterns and tells incoming sessions:
“Hold on, I see too many cooks in the kitchen. You go next, you wait a tick, you back off a tick more.”
It’s not random. SQL Server spaces out the latch requests in microsecond-scale delays. That way, instead of 50 sessions colliding and blocking, they get serialized in a more orderly way. The end result is:
- Less spinning (threads don’t hammer the latch in a tight loop)
- Lower CPU overhead
- More predictable throughput under concurrency
- Total runtime closer to “ideal parallel”
Yes, some individual inserts might see a micro-delay, but the overall workload finishes faster because you avoid chaos at the hot page.
Starting in SQL Server 2019, Microsoft gave us a way to tame this chaos. With a single switch, you can tell SQL Server to coordinate concurrent inserts into a sequential key index using an internal backoff algorithm.
ALTER INDEX [PK_SalesOrders] ON dbo.SalesOrders SET (OPTIMIZE_FOR_SEQUENTIAL_KEY = ON);
Here’s what changes. Without this option, every insert thread charges straight at the last page, all trying to grab the exclusive latch at once. One session gets it, and the rest either spin aggressively or wait in line, racking up PAGELATCH_EX waits. CPU time is wasted, and overall runtime drags.
With OPTIMIZE_FOR_SEQUENTIAL_KEY enabled, SQL Server steps in as a quiet traffic cop. It notices the contention and starts spacing out latch requests. Internally, it applies a smart backoff: “You go now, you wait a microsecond, you wait a little longer.” Instead of 50 sessions slamming the same door, they’re queued more gracefully.
The magic is that it doesn’t require schema changes, application rewrites, or fancy hacks. It works at the index level (not server-wide), so you can enable it only on the tables that actually suffer from this problem. You can even check if it’s enabled:
SELECT name, optimize_for_sequential_key FROM sys.indexes WHERE object_id = OBJECT_ID('dbo.SalesOrders');
Under the hood, you’re not “removing” contention — you’re teaching SQL Server to manage it better. The result is less wasted CPU, smoother throughput, and fewer latch storms.
So What Does It Actually Do? Internally, SQL Server builds a “smart queue” when it sees intense concurrent inserts into the same index hot spot. Instead of letting every thread rush the latch at once (and cause massive spinning and blocking), it gently delays new insert attempts when it detects contention.
- No schema rewrite.
- No index rebuild.
- No guesswork.
It’s like a mini traffic controller sitting on top of your latch.
Simulation: Proving the Bottleneck
Let’s build a test environment. First, create a table with a sequential clustered index:
DROP TABLE IF EXISTS dbo.TestInserts; GO CREATE TABLE dbo.TestInserts ( ID INT IDENTITY(1,1) PRIMARY KEY, Data CHAR(100) );
Insert a million rows first to fill the B-tree structure:
INSERT INTO dbo.TestInserts (Data) SELECT TOP 1000000 REPLICATE('A', 100) FROM sys.all_objects a, sys.all_objects b;
Instead of opening a bunch of SSMS windows, I used PowerShell to fire up 50 parallel sessions. Each session runs a loop of 20,000 inserts and logs how long it took. I ran this script several times with OPTIMIZE_FOR_SEQUENTIAL_KEY both ON and OFF, and each time the wait stats lined up closely with the corresponding setting.
$connectionString = "Server=localhost;Database=simple_talk;Integrated Security=True;" $query = @" SET NOCOUNT ON; DECLARE @i INT = 0; WHILE @i < 20000 BEGIN INSERT INTO dbo.TestInserts (Data) VALUES (REPLICATE('B',100)); SET @i += 1; END "@ $totalSw = [System.Diagnostics.Stopwatch]::StartNew() $jobs = @() for ($i=1; $i -le 200; $i++) { $jobs += Start-Job -ScriptBlock { param($conn,$q,$n) $sw = [System.Diagnostics.Stopwatch]::StartNew() $sqlConn = New-Object System.Data.SqlClient.SqlConnection $conn $sqlConn.Open() $cmd = $sqlConn.CreateCommand() $cmd.CommandTimeout = 0 $cmd.CommandText = $q $cmd.ExecuteNonQuery() | Out-Null $sqlConn.Close() $sw.Stop() [PSCustomObject]@{ Session = $n ElapsedMs = $sw.ElapsedMilliseconds } } -ArgumentList $connectionString,$query,$i } $results = $jobs | Receive-Job -Wait -AutoRemoveJob $totalSw.Stop() $results | Sort-Object Session | Format-Table -AutoSize Write-Host "Total runtime: $($totalSw.Elapsed.ToString())"
This way, you capture per-session runtimes and the aggregate totals.
In our tests, the overall wall-clock runtime came out very close whether OPTIMIZE_FOR_SEQUENTIAL_KEY was ON or OFF. The difference was only about 10 seconds across the entire run — not dramatic. But the wait stats told a different story: with the option OFF, latch waits stacked almost linearly with runtime, showing raw contention. With it ON, we saw about ~10s of extra accumulated wait time, but that wasn’t “slowness” — it was SQL Server deliberately spacing out inserts with micro-delays. The payoff is that concurrency is smoothed out, threads don’t spin wildly, and CPU stays calmer, even though the total batch window barely changes.
- When OFSK was OFF, total runtime for all 200 sessions: 00:02:17.4372205
- When OFSK was ON, Total runtime for all 200 sessions: 00:02:05.9792700
We can track different wait types for PAGELATCH by running below query while both runs i.e OFSK was off and .on.
SELECT * FROM sys.dm_os_wait_stats WHERE wait_type LIKE 'PAGELATCH_%';
When OFSK was OFF--->
Now enable optimization by turning on OPTIMIZE_FOR_SEQUENTIAL_KEY(OFSK) the index:
ALTER INDEX [PK_TestInserts] ON dbo.TestInserts SET (OPTIMIZE_FOR_SEQUENTIAL_KEY = ON);
Run the test again — same workload, same data, same hardware — but now with throttling applied.
When OPTIMIZE_FOR_SEQUENTIAL_KEY was OFF, we saw around 904k PAGELATCH_EX waits. Total wait time stacked up to about 354k ms, and the longest single wait went up to 66 ms. That’s SQL Server showing classic latch contention — lots of sessions hammering the same last page, queuing up behind each other, and some unlucky threads waiting longer than others. The workload still finishes, but it feels uneven, with certain sessions stalling more than they should.
When we flipped it ON, the latch waits actually dropped slightly — to around 877k. More importantly, the total wait time came down too, roughly 206k ms, and the longest single wait shrank to 33 ms. What’s happening is SQL Server is no longer letting all threads fight at once. Instead, it introduces micro-pauses that spread the inserts more evenly. That’s why you don’t just see fewer waits, but also smoother throughput: sessions don’t pile up behind one hot page the way they did before.
So the comparison is clear: OFF = higher waits, longer stalls, uneven performance. ON = fewer waits, shorter stalls, more balanced flow. The actual wall-clock runtime of the whole batch may look similar, but under the hood the ON setting delivers a calmer, fairer ride for every session.
Real-World Use Cases
This feature is made for:
- High-insert OLTP systems
- Tables with clustered index on IDENTITY
- Systems with intense concurrency
- Logging, auditing, telemetry pipelines
- Warehousing workloads with
insert into fact_table
patterns
What It Doesn’t Fix
This setting doesn’t help if:
- You don’t have concurrent inserts
- Your keys are already random (GUIDs, hash)
- You’ve got contention on non-leaf levels
- You’re bottlenecked on disk I/O, not latches
Also, this is not a performance booster in all scenarios— it helps specifically when the root cause is latch contention from sequential inserts.
Can It Cause Problems?
Though it looks safe from documentation and tests, it’s not a magic bullet. In some cases, workload profiling tools may show slightly increased latency per insert — because SQL Server adds deliberate microsecond-scale waits to smooth contention. But the overall system becomes more stable and scalable. Refer this blog (Microsoft TechCommunity).
Summary
The best performance features are the ones you don’t have to think about. And OPTIMIZE_FOR_SEQUENTIAL_KEY is one of those rare features: you flip it on, and SQL Server gets smarter about concurrency.
For tables suffering from last-page insert issues, this simple index option might save you hours of pain, code rewrites, or scaling discussions. It’s surgical, targeted, and non-invasive , exactly what a moderate fix should be.
If you’re running SQL Server 2019 or later and haven’t checked your hot OLTP tables yet… maybe it’s safe to flip that switch.