T-SQL Tuesday #198 Roundup: How Do You Detect Data Changes?

Thank you to everyone who participated in T-SQL Tuesday #198! When I wrote the invitation post, I intentionally kept the prompt broad because change detection looks different depending on your source system, your infrastructure, your data volumes, and what you need to do with the changes once you have them. The responses covered SQL Server internals, Microsoft Fabric and Synapse, hashing strategies, metadata-driven frameworks, and Synapse workspace diffing with Python. Here’s a summary of each contribution.

The Roundup

Rob Farley – Detecting Changes | LobsterPot Solutions

Rob covered several approaches he’s used over the years and focused on Change Event Streaming (CES) in SQL Server 2025. CES scans the transaction log and streams committed changes (including old and new values) directly to Azure Event Hubs, eliminating the intermediate hop of CDC. He’s enthusiastic about it but gives an honest accounting of the current limitations: no support for json or xml column types, no table partitioning with SWITCH, and renamed columns will break it. He considers it his preferred method for tracking history, but notes the limitations are still pretty annoying and hopes they’ll be addressed in a future release.

Kevin Chant – Developing Metadata-Driven Frameworks for Microsoft Fabric to Cater for Change Detection | Chantified Lens

Rather than focusing on a specific detection mechanism, Kevin zoomed out to the framework level. He highlighted several open-source, metadata-driven frameworks for Microsoft Fabric that already include watermark and incremental load logic, and strongly recommends customizing an existing mature solution rather than building from scratch. If you do improve on one, contribute it back.

Deborah Melkin – Capturing Data Changes | Deb the DBA

Deborah drew on her experience building a conference session comparing CDC and CES. A few CDC lessons that aren’t obvious from the documentation: the SQL Agent jobs CDC creates don’t account for Availability Group failovers, but you can modify them to check which replica is primary, a tip she credits to Jeff Iannucci. CDC is also designed to be transient, so if something goes wrong and you don’t catch it for a week, the data you needed may already be outside the default 3-day retention window. Her broader point: the biggest problems with CDC usually aren’t CDC itself, but insufficient monitoring.

Andy Brownsword – Change Detection

Andy shared a before-and-after story with real numbers. His team had an overnight batch process ingesting 12 months of transactional data (~250 million records) with an 8-hour runtime, and over 99% of those rows hadn’t changed. The solution was hashing: generate a hash of each record at read time, compare against the stored hash, and filter out unchanged records before they ever reach the database. That brought runtime down to about 30 minutes. But the emphasis of the post is less on hashing and more on the principle: identify unnecessary work as early as possible in the pipeline, and quantify how much you can eliminate before choosing a solution. Hashing made sense here because 99% of records were unchanged; at 10% it might not have.

Justin Cunningham – ETL Change Detection: Stop Full Refreshing Everything | ProcureSQL

Justin argued that teams default to full refreshes because it feels safe, and that this breaks at scale as run windows stretch and compute costs climb. He compared four strategies (timestamp-based, hash diff, Change Event Stream, and source CDC) and made the point that a robust solution combines them rather than relying on any one approach alone. He also described a team using a 24-hour look-back window that seemed fine until finance found incorrect historical data: old orders were being adjusted months later without updating the updated_at column, so those changes were never picked up. The fix required rolling look-back windows, stateful watermarking, and idempotent merges. His bottom line: weak change detection leads to stale dashboards and data people can’t trust, so it’s a business problem as much as an engineering one.

Louis Davidson – How Do You Detect Data Changes? | Drsql’s Database Musings

Louis shared his practical experience, including the honest admission that he’s never used CDC or Change Tracking in production, partly because a former coworker was convinced they had issues that made them unsuitable. His go-to approach has been rowversion/timestamp columns combined with a saved watermark variable, with some buffer built in to handle overlap. He also made a good case for occasionally doing a full column-by-column comparison rather than relying on timestamps alone, and showed how the IS DISTINCT FROM syntax (added in SQL Server 2022) makes null-safe column comparisons much less painful. For his current Fabric work, he’s leaning toward comparing all columns across all rows daily rather than relying on a watermark, since he doesn’t have billions of rows and the lakehouse paradigm is built for scanning large data.

Mike Donnelly – Using Python Scripts to Diff Synapse Workspaces | A Coding Journey

Mike’s entry took a different angle. His team needed to migrate Azure Synapse workspaces while keeping development running in both the old and new workspaces simultaneously. His solution: Python scripts that parse the TemplateForWorkspace.json generated when you publish in Synapse Studio and diff the two files, producing a report of changed pipelines, datasets, notebooks, etc. They ran it daily to make sure dev work stayed in sync. He also noted that he tried asking Copilot to compare the JSON files directly and it was “annoyingly accurate,” but he still trusts his Python script more because he knows exactly what it’s doing.

From Reddit

Tony Kain shared a response in the r/MicrosoftFabric subreddit with some battle-tested config-table advice. A few highlights:

> vs >= on watermarks. Depending on the precision of your watermark column, > can silently miss rows that share the exact boundary value, while >= will reprocess them. Store the operator in your config table and build the predicate dynamically.

Sargability matters. Wrapping your watermark column in COALESCE() or a function like YEAR() kills index seeks. Fix the null problem at the source or pick a different column.

Hash vs. full overwrite in lakehouse storage. Every UPDATE in Parquet rewrites files, so for wide tables with broad changes a full overwrite often beats a MERGE. Measure before assuming incremental is faster.

Three hashes for a Type 2 SCD. One hash on the business key, one on Type 1 attributes, one on Type 2 attributes. Type 2 hash differs? Expire and insert. Only Type 1 differs? Update in place. Neither differs? Do nothing.

Timezones belong in config too. Store the source timezone alongside your watermark metadata and convert to UTC on ingest. DST transitions create one-hour gaps or replay windows twice a year.

Themes and Takeaways

A few things came up across multiple posts worth noting.

There is no universal solution. Hashing, watermarks, CDC, CES, temporal tables — every approach has a context where it shines and a context where it fails. Understanding your data volume, your source system’s capabilities, your infrastructure, and your latency requirements all affect the right choice.

The new SQL Server 2025 options are promising but have real limitations. CES came up in multiple posts. Rob and Deborah both noted the current constraints around data types, partitioning, and the fact that it’s still early. It’s worth watching and testing, but the current limitations mean it won’t work for every situation.

Monitoring and operational discipline matter as much as the detection mechanism. Deborah’s point about CDC monitoring was echoed in Justin’s example: the technology can be sound and still fail you if you’re not watching it closely and accounting for edge cases like late-arriving data, AG failovers, and retention window mismatches.

Eliminate unnecessary work as early as possible. Andy’s post is the clearest example, but the principle runs through nearly every contribution: don’t move data you don’t need to move, and don’t process rows that haven’t changed.

Thanks again to everyone who participated!

The post T-SQL Tuesday #198 Roundup: How Do You Detect Data Changes? first appeared on Data Savvy.