Technical Article

Stairway to Azure SQL Hyperscale Level 6: Backup and Restore Internals

,

Introduction

When we think about backups in the traditional SQL Server world, the mental picture is usually a heavy process. A full backup on a multi-terabyte database isn’t just a background task; it’s a major event. The database engine has to read every allocated page, write out gigabytes or even terabytes to disk, and make sure nothing slips out of sync while users are still hitting the system. It is reliable, but it is also slow, storage-hungry, and stressful to run on very large systems.

Now imagine running this same process on a cloud database that holds tens of terabytes. If Hyperscale tried to back up data the old-fashioned way, the whole idea of “cloud scale” would fall apart. Customers would be waiting hours for backups to complete, recovery would be unreliable in tight RPO/RTO scenarios, and Azure would be burning cycles moving petabytes of data around. Clearly, something very different was needed.

This is where the snapshot-based approach comes in. Instead of thinking about backups as copies of data files, Hyperscale treats them as storage-level snapshots. These snapshots are lightweight and almost instant because they don’t actually move data. They simply capture a point-in-time view by recording metadata about page versions. Add in continuous transaction log streaming through the Hyperscale Log Service, and suddenly you have a system where full, differential, and log backup concepts are all still present—but implemented in a way that feels entirely different from classic SQL Server.

The end result is that backups in Hyperscale are always happening in the background without blocking, without chewing up I/O, and without administrators manually orchestrating jobs. Recovery shifts from being a slow, file-based restore process to being a metadata operation. When you ask for a point-in-time restore, Azure doesn’t copy terabytes of data—it simply rewinds the version pointers and replays the last bit of log to land at the exact second you choose. This makes backup and restore in Hyperscale not just faster, but fundamentally re-imagined for the cloud.

How Backups Work in Hyperscale

When we talk about backups in Hyperscale, the first thing to understand is that they don’t look or behave like the backups we’re used to in classic SQL Server. There are no .bak files sitting somewhere in storage, and there’s no job you kick off to take a differential or log backup. Everything is built into the architecture itself. That means backups are always happening, even if you don’t see them directly.

At the foundation are full snapshots, taken roughly every 12–24 hours. These snapshots are created at the Azure Storage level, not by copying every data page into a new file. Instead, a snapshot simply records a consistent set of pointers to the existing page versions. The snapshot becomes a kind of “baseline bookmark” in time, a stable starting point for any recovery operation.

On top of that, you have something that feels like a differential, but it isn’t packaged as a separate file the way DBAs are used to. Hyperscale uses hourly checkpoints to track which pages have changed since the last snapshot. Under the covers, Azure Storage already knows when a page is modified, because it writes a new version and keeps the old one around. That’s why Hyperscale doesn’t need to create a special differential file — the system can reconstruct the state of the database at any checkpoint by following the version history of its pages. During a restore, this is what saves time: rather than starting from a 20-hour-old snapshot and replaying 20 hours of log, the system can “jump forward” to the hourly checkpoint, apply just the changed pages, and then use the logs to finish the job.

Finally, there are the transaction log backups, which in Hyperscale are streamed into the Log Service every five to ten minutes. This is where the system captures every insert, update, and delete. When you restore to a specific point in time, those logs are always applied last. They don’t sit “on top of a differential file” as in traditional SQL Server, because there are no differential files here. Instead, the restore engine starts from the last full snapshot, uses the hourly checkpoint to catch up quickly, and then replays just enough log records to land on the exact second you asked for.

And one more important note: people often ask about export/import in Hyperscale. Yes, you can export a Hyperscale database to a .bacpac file, but only if it’s relatively small — Microsoft says up to around 150 GB works with tools like SSMS or SqlPackage. Beyond that, the process becomes impractical, sometimes unreliable, and in certain cases not supported at all (for example, newer data types like VECTOR can’t be exported). For real-world databases in the terabyte range, export/import isn’t the right tool. Instead, Microsoft wants you to use the features that align with Hyperscale’s distributed design: automated snapshots, point-in-time restore, geo-restore, or database copy/clone for dev/test.

So in short, backups in Hyperscale aren’t a set of files you manage; they’re a timeline the system maintains for you. Full snapshots anchor the chain, hourly checkpoints reduce log replay time, and the transaction log stream lets you restore right down to the second. It’s the same concepts you know — full, differential, and log — but reimagined for a cloud engine that has to scale into the hundreds of ter

Full vs Differential vs Log (In Hyperscale)

TypeMechanismFrequencyStorage ImpactUsed For
Full SnapshotStorage-based snapshotEvery 12-24hLowRecovery base image
DifferentialChange-based checkpointHourlyLowFast mid-range restores
Log BackupVia Log Service (streamed)Every 5-10mModeratePoint-in-time restore

All of this is automated. You don't schedule backups. Azure handles it based on retention policies and restore points.

Backup Architecture Diagram

Below diagram helps to picture the moving parts in Hyperscale, because they don’t look like what we’ve seen in traditional SQL Server. At the top, the compute node is where your queries run, but it doesn’t hold all the data. Every transaction that changes rows is streamed out to the Log Service, which captures the sequence of operations in near real time.

Meanwhile, the actual data pages live on page servers. These page servers don’t generate .bak files like sql server; they maintain snapshots of the data as it changes. Behind them, Azure Storage is quietly tracking page versions and metadata pointers. Each time a page changes, the old version is preserved, which means the system can always reconstruct how the database looked at a given point in time.

When it comes time to restore, the restore layer doesn’t need to copy terabytes of data across the network. Instead, it just follows the metadata chain: jump back to a snapshot, pull in changed pages from hourly checkpoints, and finally replay log records from the Log Service. That’s why a restore on a multi-terabyte Hyperscale database can finish in a couple of minutes, where a traditional SQL restore of the same size might have taken hours.

Image by Author: Backup and Restore Mechanism

  • Compute Node sends transaction logs to Log Service
  • Page Servers hold data snapshots
  • Azure Storage keeps versions and snapshot pointers
  • Restore Layer can instantly rehydrate any state from metadata chain

Restore Process Internals

The magic of Hyperscale restores comes from the fact that they are mostly metadata operations, not heavy data movement. When you request a restore, three things happen in sequence:

  1. Metadata repointing – the system starts from the nearest full snapshot and uses page version metadata to lock onto a known-good state.
  2. Checkpoint acceleration – hourly checkpoints (the differential equivalent) are applied so the system doesn’t have to replay logs going back an entire day.
  3. Log replay – the Log Service replays only the small set of changes needed to land exactly on your chosen point in time.

Because of this design, restores are almost instant compared to classic SQL Server. You aren’t waiting for gigabytes of files to copy or for a restore chain to replay line by line. Instead, you’re telling Azure, “please rewind this timeline,” and the engine does it by rearranging pointers and applying just enough logs to finish cleanly.

Hands-On: Backup & Restore Experiments (Hyperscale)

One of the best ways to understand Hyperscale backups is to walk through them step by step. We’ll start by checking what backups exist, then try a restore from a full snapshot, then a point-in-time restore that uses logs, and finally look at clones and geo-restore.

Step 0 — See what backups exist

In Hyperscale, there are no .bak or .trn files to browse. Instead, you ask Azure SQL for restore points. These map directly to the concepts of full, differential, and log backups:

  • Full snapshots (full backups): taken every 12–24h.
  • Hourly checkpoints (differential equivalents): one per hour.
  • Transaction logs: streamed every 5–10 minutes, exposed as a continuous restore window.

Here is the command to list restore points in Azure PowerShell

Get-AzSqlDatabaseRestorePoint -ResourceGroupName <azure resource group> -ServerName <hyperscale server> -DatabaseName <hyperscale database>

In below image output only shows **one row** because Hyperscale databases expose restore information as a single *continuous restore point*. Unlike traditional SQL Server where you might see separate full, diff, and log backup files, here you only see:

  • EarliestRestoreDate - the oldest point-in-time you can restore to.
  • RestorePointType: Continuous - tells you PITR is always on.

You won’t see a list of every full, diff, and log backup — Azure abstracts those away. Internally, the engine takes full snapshots every 12–24 hours, hourly differentials, and log backups every 5–10 minutes, but you don’t get those listed individually. Instead, Azure just guarantees you can restore to any time between EarliestRestoreDate and “now minus ~5–10 minutes.”

You can restore the database to any point in time between the EarliestRestoreDate and the current time. Since log backups in Azure SQL happen every 5–10 minutes, there’s always a slight lag. Unfortunately, there’s no way I found yet till the time of writing this article, to see the exact log backup timestamps, and this looks like Hyperscale abstract its away, so it’s best to leave a safety buffer of about 5–10 minutes when choosing your restore point.

Equivalent command in Azure cli or bash will be below(Note: Some variables may not be present in cli, like RestorePointType, But it is good to know for most of the backup/restore tasks we can use PowerShell and bash/azure cli both ):

az sql db show \ 
  --resource-group rg1 \ 
  --server sql-sqlservercentral-1 \ 
  --name hyperscale \ 
  --query "{ResourceGroup:resourceGroup, Database:name, Location:location, EarliestRestoreDate:earliestRestoreDate, RestorePointType:RestorePointType}" \ 
  --output table

Step 1 — Seed a marker table

In this step , we will create a table AuditTrail in our database hyperscale, so that we can see the effect of backup/restore clearly.

CREATE TABLE dbo.AuditTrail
(
    AuditTimeUtc datetime2(3) NOT NULL,
    Message nvarchar(200) NOT NULL
);
GO

INSERT INTO dbo.AuditTrail VALUES (SYSUTCDATETIME(), N'Snapshot baseline- Should appear in full backup');

We can infer from below screenshot the data has been loaded with 1 row.

Step 2 — Restore from the full snapshot, i.e last available restore point

Now let’s prove that a plain snapshot restore works. Pick one of the full snapshot restore points you saw above and spin up a new DB:

az sql db restore --resource-group <ResourceGroupName> --server <ServerName> --name <SourceDatabaseName> --dest-name <NewDatabaseName> --time "<RestorePointInUTC>" --output table

Like in below command we are restoring the last available restore point.

Check the new db(hyperscale-restore) has been created in the Azure Portal:

Now we run this code:

SELECT * FROM hyperscale-restore.dbo.AuditTrail;

You should see the Snapshot baseline row.

Step 3 — Insert a new marker row (for Point in time recovery demo)

Let's insert a new row as a marker.

INSERT INTO hyperscale.dbo.AuditTrail VALUES (SYSUTCDATETIME(), N'Pre-Restore marker');

Note the UTC time — that’s what we’ll roll back past.

Lets confirm the new row is inserted in table

For record, I loaded this record at below time 14:31:37.892 timestamp.

Step 4— Point-in-time restore

Please note each PITR creates a new database in Hyperscale, so either you have to create a db with new name for PITR or create a temporary db and then rename it. Refer to this Microsoft document for more information.

Now roll the database back to before the marker insert.

RESTORE_TIME="2025-09-01T14:31:38Z"   #adjust to time market, i.e till the time we require data to restore

az sql db restore --resource-group rg1 --server sql-sqlservercentral-1 --name hyperscale --dest-name hyperscale-restore-latest --time $RESTORE_TIME --output table

We can verify the same in Azure portal--> SQL Databases

Now let's test if new row is available in hyperscale-restore-latest database:

SELECT * FROM hyperscale-restore-latest.dbo.AuditTrail ;-- This should show 2 rows..

 

This shows two rows.

Under the hood, Hyperscale recovery begins by pulling the nearest full snapshot of the database. From there, it layers on the hourly checkpoint page versions, which act like differentials, to bring the database state closer to your requested time. Finally, it replays only the necessary transaction log records to reach the exact second you specified. This flow makes it clear how backups are structured in Hyperscale—full snapshots, checkpoint pages, and log streams—and also highlights the difference between restoring from a snapshot alone versus performing a full point-in-time restore that includes log replay.

Retention and Restore Policy

  • Default retention = 7 days for PITR
  • Can be extended to 35 days
  • Full snapshot retention in paired region for geo-restore

You can adjust retention policies in Portal or via PowerShell/Azure CLI.

Summary

Hyperscale backups aren’t backups in the old sense. They’re versioned timelines with metadata-driven fast restore paths. You never copy data. You never block writes. You just restore instantly by repointing to a stable moment in the past.

This lets teams ship faster, recover safer, and clone large environments without duplicating terabytes of data.

Next up, we’ll cover Scaling Read Replicas and how to offload workloads without replication lag or Always On complexity.

Rate

(1)

You rated this post out of 5. Change rating

Share

Share

Rate

(1)

You rated this post out of 5. Change rating