SQLServerCentral Article

Think LSNs Are Unique? Think Again - Preventing Data Loss in CDC ETL

,

Introduction

If you work in DWH land, you probably lean on an LSN-based watermark to read changes from a source. Here’s the catch that bites hard: LSNs are not unique per row. An LSN marks a position in the transaction log. When a source system bulk-inserts rows inside one transaction, every one of those rows shares the same commit LSN. If your ETL stores only “last LSN read” and uses a > filter on the next run, you can quietly skip data.

Quick story. Imagine the source inserts 100 rows. They all get LSN 0x0001. Your ETL reads 50, updates its marker table to 0x0001, then fails. You rerun with WHERE __$start_lsn > 0x0001. Result: zero rows. You just skipped 50 legitimate changes. This article proves the behavior with Change Data Capture (CDC) and gives you a safe, simple watermark pattern.

What we will do

  • Set up a tiny sandbox with CDC.
  • Insert many rows in one transaction and show they share one commit LSN.
  • Reproduce the data-skip bug with a naive watermark.
  • Fix it with a safe watermark that uses LSN plus a tie-breaker.
  • Provide day-one-safe alternatives and quick troubleshooting.

Prerequisites

  • SQL Server 2016 or later.
  • SQL Server Agent running.
  • Permissions to enable CDC.
  • Use a non-production instance.

Key terms

  • LSN: a byte-position pointer in the transaction log. It is about where a change sits in the log, not about a single row.
  • CDC: a feature that reads the log and writes changes into change tables like cdc.dbo_<Table>_CT.
  • CDC columns:
    • __$start_lsn: the commit LSN for the transaction that produced the change.
    • __$seqval: the ordering of changes within that same __$start_lsn.
    • __$operation: 2 insert, 3 delete, 4 update-before, 5 update-after.

Step 1. Create a sandbox and enable CDC

This step will create a simple database and tables for the demo purpose.

IF DB_ID('LSN_Demo') IS NOT NULL
BEGIN
  ALTER DATABASE LSN_Demo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
  DROP DATABASE LSN_Demo;
END;
GO
CREATE DATABASE LSN_Demo;
GO
USE LSN_Demo;
GO

EXEC sys.sp_cdc_enable_db;
GO

CREATE TABLE dbo.Sales(
  SalesID int NOT NULL PRIMARY KEY,
  Item    nvarchar(50) NOT NULL,
  Qty     int NOT NULL
);
GO

EXEC sys.sp_cdc_enable_table
  @source_schema = N'dbo',
  @source_name   = N'Sales',
  @role_name     = NULL,
  @supports_net_changes = 0;
GO

BEGIN TRY
  EXEC sys.sp_cdc_change_job @job_type='capture', @pollinginterval=1;
  EXEC sys.sp_cdc_start_job  @job_type='capture';
END TRY BEGIN CATCH END CATCH;
GO

Step 2. Insert many rows in one transaction and observe one LSN

In this step, we are inserting 100 rows in a single transaction which should generate one single LSN value.

-- You cannot truncate table which have CDC enabled
DELETE FROM dbo.Sales;

BEGIN TRAN;
;WITH X AS
(
  SELECT TOP (100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
  FROM sys.all_objects
)
INSERT dbo.Sales(SalesID, Item, Qty)
SELECT rn, CONCAT(N'Item', rn), 1
FROM X;
COMMIT;
GO

WAITFOR DELAY '00:00:03';
GO

-- Proof: many rows, one commit LSN
SELECT
  COUNT(*) AS rows_captured,
  COUNT(DISTINCT __$start_lsn) AS distinct_commit_lsns
FROM cdc.dbo_Sales_CT;  -- Expect: rows_captured = 100, distinct_commit_lsns = 1

-- Show the single LSN value
SELECT TOP (1)
  sys.fn_varbintohexstr(__$start_lsn) AS commit_lsn_hex
FROM cdc.dbo_Sales_CT;

Expected outputs:

  • rows_captured = 100
  • distinct_commit_lsns = 1
  • One hex LSN value, same across rows

Step 3. The silent data-loss trap: naive watermark

We will simulate the classic mistake: store only __$start_lsn, as a filter with > on rerun, and watch rows disappear and how it can impact your ETL.

The below code will track a watermark, called last_start_lsn, which represents the last CDC change it has already processed.  This is what happens in the code.

  1. It creates a table to store that watermark and initializes it to NULL (meaning nothing processed yet).
  2. It reads the next batch of CDC rows (top 50 rows) where the __$start_lsn is greater than the stored watermark.
  3. It treats those 50 rows as “processed”.
  4. It updates the watermark to the highest LSN found in those 50 rows.
  5. It re-runs the query that fetches “new” CDC rows and expects to see zero, because it thinks everything up to that watermark is done.

In short: this code processes CDC changes in batches and records the largest LSN it has handled. Then it re-queries CDC to confirm that nothing new remains after updating the watermark.

IF OBJECT_ID('dbo.Watermark_Naive') IS NOT NULL DROP TABLE dbo.Watermark_Naive;
CREATE TABLE dbo.Watermark_Naive(last_start_lsn varbinary(10) NULL);
INSERT dbo.Watermark_Naive VALUES(NULL);
GO

-- First pass: process 50 rows, then "crash"
DECLARE @last varbinary(10) = (SELECT last_start_lsn FROM dbo.Watermark_Naive);


  SELECT TOP (50) __$start_lsn, __$seqval, SalesID
  INTO #ToProcess
  FROM cdc.dbo_Sales_CT
  WHERE (@last IS NULL OR __$start_lsn > @last)
  ORDER BY __$start_lsn, __$seqval

SELECT COUNT(*) AS processed_now FROM #ToProcess;  -- Expect 50

UPDATE dbo.Watermark_Naive
SET last_start_lsn = (SELECT MAX(__$start_lsn) FROM #ToProcess);

-- Rerun with the naive filter
DECLARE @last2 varbinary(10) = (SELECT last_start_lsn FROM dbo.Watermark_Naive);

SELECT COUNT(*) AS rows_visible_on_rerun
FROM cdc.dbo_Sales_CT
WHERE __$start_lsn > @last2;  -- Expect 0

Expected outputs:

  • First pass: processed_now = 50
  • Second pass: rows_visible_on_rerun = 0

This reproduces the 0x0001 story in real life.

Step 4. The fix: a re-startable, lossless watermark

Store two values: __$start_lsn and __$seqval. Then select with a two-part predicate. This guarantees you can stop and resume without skipping. The below pattern processes CDC rows in a safe, crash-proof way by remembering the exact row it stopped at. It keeps two markers:

  • the last LSN it processed, and
  • the last sequence value inside that LSN.

Using these two values, it selects the next 50 CDC rows that come strictly after the last processed row. After processing those 50 rows, it updates the markers to the highest LSN and seqval from that batch. When it runs again, the next 50 rows still show up correctly, because the filter knows exactly where the previous batch ended.

IF OBJECT_ID('dbo.Watermark_Safe') IS NOT NULL DROP TABLE dbo.Watermark_Safe;
CREATE TABLE dbo.Watermark_Safe
(
  last_start_lsn varbinary(10) NULL,
  last_seqval    varbinary(10) NULL
);
INSERT dbo.Watermark_Safe VALUES (NULL, NULL);
GO

-- Process first 50 safely
DECLARE @l varbinary(10) = (SELECT last_start_lsn FROM dbo.Watermark_Safe);
DECLARE @s varbinary(10) = (SELECT last_seqval    FROM dbo.Watermark_Safe);

  SELECT TOP (50) __$start_lsn, __$seqval, SalesID
  INTO  #NextBatch
  FROM cdc.dbo_Sales_CT
  WHERE (@l IS NULL OR __$start_lsn > @l OR (__$start_lsn = @l AND __$seqval > @s))
  ORDER BY __$start_lsn, __$seqval

SELECT COUNT(*) AS processed_now FROM #NextBatch AS NB;  -- Expect 50

UPDATE dbo.Watermark_Safe
SET last_start_lsn = (SELECT MAX(__$start_lsn) FROM #NextBatch),
    last_seqval    = (SELECT MAX(__$seqval)    FROM #NextBatch);

-- Rerun: remaining 50 appear and are processable
DECLARE @l2 varbinary(10) = (SELECT last_start_lsn FROM dbo.Watermark_Safe);
DECLARE @s2 varbinary(10) = (SELECT last_seqval    FROM dbo.Watermark_Safe);

SELECT COUNT(*) AS rows_visible_on_rerun
FROM cdc.dbo_Sales_CT
WHERE (@l2 IS NULL OR __$start_lsn > @l2 OR (__$start_lsn = @l2 AND __$seqval > @s2));  -- Expect 50

Expected outputs:

  • First pass: processed_now = 50
  • Rerun: rows_visible_on_rerun = 50

You read the rest without missing anything.

Day-one-safe alternative

If your target tables are idempotent, use an inclusive LSN filter and let PK or a MERGE pattern handle duplicates:

-- Read inclusively
WHERE __$start_lsn >= @last_lsn

You may re-read some rows, but you will not lose data.

Cleanup

EXEC sys.sp_cdc_disable_table @source_schema='dbo', @source_name='Sales', @capture_instance='dbo_Sales';
EXEC sys.sp_cdc_disable_db;
USE master;
ALTER DATABASE LSN_Demo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE LSN_Demo;

Wrap-up

  • LSNs are log positions, not row identifiers.
  • Bulk changes in one transaction share the same commit LSN.
  • A single-column LSN watermark plus > can skip data after a mid-batch failure.
  • Store __$start_lsn and __$seqval, or read inclusively with an idempotent target, and you are safe.

Rate

(1)

You rated this post out of 5. Change rating

Share

Share

Rate

(1)

You rated this post out of 5. Change rating