Stairway to Azure SQL Hyperscale – Level 2: Page Server Architecture Explained

Introduction

In Level 1, , we introduced Azure SQL Hyperscale and looked at why Microsoft designed it to break past traditional SQL Server limits. In this level, we’ll focus on one of the core building blocks that makes Hyperscale possible: the page server. These remote storage engines separate compute from data, stream pages on demand, and enable Hyperscale to grow far beyond the old boundaries. We’ll explore how page servers store and serve data, how they interact with RBPEX and the Log Service, and we’ll finish with a demo showing the difference between cold and warm reads.

If you want to understand how Hyperscale really works, you need to understand page servers. In this level, we’ll go deep into what they are, how they store and serve data , and how they handle writes through the Log Service. We’ll also walk through a demo that shows the difference between a cold read (where the page has to be streamed from a page server) and a warm read (where it’s already cached locally).

What is a Page Server?

In Azure SQL Hyperscale, a Page Server is a specialized service that stores and streams data pages, those familiar 8 KB blocks that SQL Server uses for everything from tables and indexes to metadata. But unlike traditional SQL Server, where data pages are read from local disk, Hyperscale stores them remotely. The page server is that remote storage engine.

You can think of a page server as a lightweight, high-speed file server designed just for database pages. It doesn’t handle queries, joins, or business logic—that’s the compute node’s job. Its only responsibility is to serve data pages quickly and reliably, whenever the compute tier asks for them.

How Page Servers Fit in the Bigger Architecture

To understand where page servers sit in the big picture, let’s look at the main building blocks of the Hyperscale storage layer:

Compute Node → Handles query execution

RBPEX Cache → Tries to serve data locally

Page Server → Streams the page from remote storage if not cached

Log Service → Independently tracks and applies changes

These components work together to separate compute from storage and still deliver high performance.

Figure 1: Hyperscale Storage Layer – Compute and Page Server Relationship(Image by Author)

In this architecture as demonstrated in above image(figure 1) , the page server acts as the dedicated storage layer, completely separate from the compute node. When a query runs, the compute node first checks its local cache (called RBPEX) to see if the required data page is already in memory. If it’s not, the compute node sends a request to the relevant page server, which then streams that specific 8 KB page over the network — just the page, not the whole file. This keeps memory usage lean and avoids unnecessary data movement. The compute node then uses that page to continue query execution and caches it for future use. In this model, page servers serve as the remote-but-responsive source of truth for data, letting Hyperscale scale out storage independently while still supporting high-performance reads on demand.

The compute node first checks its local cache (RBPEX). If a requested page isn’t cached, it asks the relevant page server to stream it. The page server pulls it from its own SSD or blob-based storage and sends it back—just the page, not the whole file.

How Page Servers Store Data

Each page server in Hyperscale is responsible for a slice of your database’s total storage. But it doesn’t work like a traditional disk with .mdf or .ndf files. Instead, the data is broken into segments, and each segment contains a chunk of 8 KB pages. Think of segments as containers—each one holding thousands of SQL Server pages, grouped and indexed for fast retrieval.

This segmented design helps with parallelism and lookup efficiency. When the compute node needs a specific page during query execution, it doesn’t have to scan through a massive file. It knows exactly which segment holds that page and sends a direct request to the page server responsible for that segment.

For example, if the compute node is processing a query and realizes it needs Page ID 8891, it doesn’t say “read file.” Instead, it sends a precise message like: “Give me Page 8891 from Segment 12.”

The page server locates Segment 12 in its local storage—usually backed by high-performance SSDs—and quickly finds Page 8891 within that segment. It then streams just that 8 KB page over the wire to the compute node. If the system deems that page colder or less frequently accessed, it might be stored in Azure blob storage, which trades off a bit of latency for better cost efficiency. For hotter data, SSDs give much faster response times.

This entire process is tightly optimized. Even though the page is coming from remote storage, the network and I/O stack are fast enough that it often feels like local disk performance. You’re not pulling entire tables, indexes, or partitions—just the exact pages needed, when they’re needed. This on-demand streaming model keeps memory pressure low on the compute tier, reduces unnecessary I/O, and makes it easy to scale to hundreds of terabytes—because no single node ever needs to “hold it all.”

In short, page servers are like high-speed librarians. They don’t read the whole book—they just hand over the exact page you asked for, in milliseconds.

Why They’re Remote — And Why That’s Powerful

Page servers are not installed with your SQL instance. They live in their own layer, fully separate from the compute node. This separation gives you a few huge benefits:

You no longer need to pre-size storage. Page servers grow automatically as your data grows.
Hyperscale can scale out storage horizontally—more page servers are added behind the scenes as needed.
The Page Servers reduce pressure on the compute tier. You don’t need to allocate massive memory buffers or large local disks.
Read replicas can be spun up instantly—since the data already lives in remote page servers and doesn’t need to be copied or restored.

This setup is cloud-native. It’s designed for elasticity, fault tolerance, and massive scale. You can grow from a few gigabytes to 100+ terabytes without worrying about file management, growth settings, or disk size limitations.

One Database, Many Page Servers

As your database grows—whether through inserts, bulk loads, or expanding table sizes—Hyperscale automatically adds more page servers. You don’t have to do anything. There’s no need to define partitions, sharding keys, or extra filegroups. You just keep writing data, and Hyperscale quietly scales out the storage layer by provisioning new page servers and assigning them segments of the database to manage.

Figure 2 below shows the most basic Hyperscale storage pattern: a single database with one compute node connected to multiple page servers. This setup is typical for small-to-medium workloads or when only one read-write compute instance is active.

Each page server is responsible for a range of database pages, grouped internally into segments. When a query runs, the compute node doesn’t need to know where each segment lives. It simply requests the pages it needs. Behind the scenes, Hyperscale’s routing layer directs each request to the appropriate page server, which streams the 8 KB page over the wire—fast and lightweight.

This setup allows the compute node to:

Pull data from different page servers in parallel
Avoid overloading any single storage node
Improve query performance by distributing I/O across multiple servers
Scale to massive data volumes without schema or index changes

Even better, the system is elastic. As the database grows, new page servers are added automatically. No need to configure partitions, filegroups, or custom sharding logic.

Figure 2: Demonstrating one db , one compute and many page servers(Image by Author)

As workloads grow, it’s common to add additional compute nodes for read scale-out. These compute nodes—often used for reporting, analytics, or background queries—still connect to the same shared page servers.

This is shown in Figure 3. The same database now supports:

Multiple compute nodes (1 primary, others read-only)
All connected to a shared set of page servers
Every compute node reads from multiple page servers
And each page server serves multiple compute nodes

This many-to-many setup is what gives Hyperscale its elastic read performance. It allows parallel processing across replicas while still keeping a single source of truth in the storage layer. And because page servers are stateless in terms of queries, there’s no overhead in letting multiple compute nodes fetch pages concurrently.

Figure 3: many to many relationship between compute nodes and page servers(Image by Author)

This distributed layout is one of the key reasons Hyperscale performs well under massive data loads—it breaks the traditional link between database size and single-node storage performance.

Page Movement: On-Demand Streaming

Page servers don’t proactively push data. They respond only when compute asks. This lazy-load model keeps memory usage tight on the compute tier and avoids loading entire tables or indexes into memory. Here’s what happens during a typical SELECT query:

Compute requests a page (PageID=8891)
RBPEX cache misses
Page server streams Page 8891 over the wire
RBPEX stores it for future access
Compute returns the result

The image below (figure 4) shows the basic data flow in Azure SQL Hyperscale when a query requests a page. On the left, the Compute Node runs the query and checks its local RBPEX Cache to see if the required page is already available. If the page isn’t cached, the compute node issues a request to the Page Server on the right. The page server locates the requested page (in this case, Page 8891) from its local SSD or blob storage and then streams just that 8 KB page back to the compute node.

The key takeaway is that Hyperscale doesn’t transfer entire files, partitions, or indexes—only the exact pages needed for query execution. This keeps the memory footprint smaller, reduces I/O, and allows the database to scale seamlessly to hundreds of terabytes while still feeling responsive.

Figure 4: Page movement illustration(Image by Author)

The whole thing is pipelined and parallelized. You can fetch thousands of pages across multiple page servers while still processing other parts of the query.

How Page Servers Handle Writes

You might be wondering—what about write operations? This is where the Log Service and redo process kick in. When a write happens, the compute node:

Sends the log record to the log service (not to the page server)
The page server later receives redo operations from the log service
It applies those changes to its own local data pages asynchronously

The below diagram (figure 5) shows what happens when a write occurs in Azure SQL Hyperscale. Unlike traditional SQL Server, where changes are written directly to the data file, the Compute Node here sends log records to the centralized Log Service. The page servers don’t receive the changes immediately. Instead, the log service later sends redo operations to the Page Servers, which then apply those changes asynchronously to their local data pages.

This separation ensures durability and scalability. The log service acts as the single source of truth for all modifications, while page servers focus on keeping their local storage in sync by replaying logs. That design makes it easier to scale storage independently and allows Hyperscale to recover quickly if a page server fails—because its state can always be rebuilt from log replay.

Figure 5: The flow of write on page server(Image by Author)

So writes don’t go directly to the page server. That separation ensures durability is handled centrally and scaling is easier. It’s like page servers are read-first, write-later systems that stay consistent via logs.

Fault Tolerance and Redundancy

In Azure SQL Hyperscale, redundancy is fully managed by the platform and not exposed to the DBA. While traditional SQL Server environments rely on manual configurations—like Always On Availability Groups, failover clusters, or mirroring—to ensure high availability, Hyperscale shifts that responsibility to the service layer. You don’t see, manage, or interact with redundant page servers directly.

Each page server is backed by Azure’s underlying storage fabric, which maintains local redundancy and failover logic. If a page server fails, Hyperscale automatically spins up a new one and replays the log records from the centralized Log Service to restore its state. This process is seamless and transparent. You won’t find system views or DMV queries that show “replica page servers” or “backup segments”—the entire redundancy layer operates beneath the surface.

That said, you can still monitor the impact. For example, sudden latency spikes, increased PAGEIOLATCH_SH waits, or elevated avg_data_io_percent in sys.dm_db_resource_stats may hint at background recovery in progress. But the failover, recovery, and health management of page servers is abstracted away, allowing you to focus on the logical database without worrying about the underlying infrastructure.

This is by design. Hyperscale is built for zero-admin elasticity and resilience, and its self-healing storage tier ensures fault tolerance without human intervention. As a DBA, you're freed from configuring storage HA—because it's already built into the platform.

Refer figure 6, which shows what happens if a Page Server fails in Azure SQL Hyperscale. The compute node doesn’t write directly to the page server—it always sends log records to the centralized Log Service. Because of that, when a page server goes down, Hyperscale can rebuild it automatically. The system provisions a new page server instance and restores its state using two sources:

Redundant Copy in Azure Storage Fabric – which provides the base data pages.
Log Replay from the Log Service – which re-applies recent changes to bring the new page server back in sync.

This process is seamless and transparent. The compute node automatically redirects requests once the new page server is ready, and you don’t have to manage failover or replicas yourself. This built-in redundancy ensures high availability without the complexity of traditional SQL Server HA features like mirroring or Availability Groups.

Figure 6: A failed page server is replaced automatically. The new page server is rebuilt using data from Azure’s storage fabric and log replay from the Log Service. (Image by Author)

Hands-On Demo: Simulating Page Fetch Overhead

We can’t RDP into a page server, but you can simulate its behavior from T-SQL. We’ll do a basic cold vs warm read test on a 5 million row table, LargeOrders.

To simulate real-world read performance and page server behavior in Azure SQL Hyperscale, we create a synthetic table named LargeOrders that mimics a high-volume transactional system. This table contains columns for OrderID, CustomerID, ProductID, OrderDate, Quantity, and Price, with OrderID serving as the primary key. To generate a large dataset, we use a T-SQL loop that runs 50 times, each time inserting 100,000 rows using a WITH Numbers CTE constructed from system views (sys.all_objects). The inserted rows simulate randomized order data: customer and product IDs are randomized using NEWID() and CHECKSUM, order dates are spread over the last 365 days, and price and quantity are generated with controlled randomness. Each batch introduces a 1-second delay using WAITFOR DELAY to prevent overwhelming the system or hitting throttling limits.

This approach results in over 5 million rows, which is sufficient to test cold vs. warm query behavior, RBPEX caching effects, and page streaming from the storage tier. By carefully structuring the dataset and spreading dates and IDs, we ensure the resulting pages are diverse enough to demonstrate how Azure Hyperscale fetches and caches data, particularly in scenarios involving selective reads and large table scans.

-- Create the table first
DROP TABLE IF EXISTS dbo.LargeOrders;
GO

CREATE TABLE dbo.LargeOrders
(
    OrderID     BIGINT IDENTITY(1,1) PRIMARY KEY,
    CustomerID  INT NOT NULL,
    ProductID   INT NOT NULL,
    OrderDate   DATETIME2 NOT NULL,
    Quantity    INT NOT NULL,
    Price       MONEY NOT NULL
);
GO

-- Batch Insert Loop
DECLARE @i INT = 0;

WHILE @i < 50
BEGIN
    WITH Numbers AS (
        SELECT TOP (100000)
            ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
        FROM sys.all_objects a
        CROSS JOIN sys.all_objects b
    )
    INSERT INTO dbo.LargeOrders (CustomerID, ProductID, OrderDate, Quantity, Price)
    SELECT 
        ABS(CHECKSUM(NEWID())) % 10000,
        ABS(CHECKSUM(NEWID())) % 5000,
        DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, GETUTCDATE()),
        ABS(CHECKSUM(NEWID())) % 20 + 1,
        CAST(ABS(CHECKSUM(NEWID())) % 50000 / 100.0 AS MONEY)
    FROM Numbers;

    SET @i += 1;

    -- Optional: slow down between batches
    WAITFOR DELAY '00:00:01';
END

Step 1: Cold Read (Uncached Pages)

First read often will be cold and there will be some physical reads.

See image below for output of cold read:

Figure 7: Cold Read –uncached pages trigger physical reads from page server

Note the following metrics.

Physical Reads: 3
Page Server Reads: 1
Read-Ahead Reads: 36,666
Time taken: 1402 ms

This confirms SQL Server is fetching from remote page servers, and preloading data into memory via read-ahead mechanisms.

Step 2: Warm Read (Cached in RBPEX)

We immediately re-run the same query, and this time execution was bit faster as no page server hit or physical read happened and data is returned from cache:

See image below for output of warm read:

Figure 8: Warm Read –cached pages trigger pure logical reads

See how the metrics have changed. The same query now runs faster with no I/O, because RBPEX (Resilient Buffer Pool Extension) cached the hot pages from the prior execution.

Physical Reads: 0
Page Server Reads: 0
Time taken: 983 ms

Summary

Page servers are what make Hyperscale truly “hyperscale.” They eliminate the need to size up disks, pre-allocate data files, or cram everything into memory. Instead, they offer a flexible, on-demand streaming layer that keeps storage separate, scalable, and smart.

The best part? You don’t manage them. No agents. No backup jobs. No mirroring. They just work behind the scenes, scaling with you as your database explodes.

In the next level, we’ll go even deeper—into the RBPEX caching layer—and explain how Hyperscale keeps reads fast even when you’re sitting on terabytes of remote data.

Stairway to Azure SQL Hyperscale – Level 2: Page Server Architecture Explained

Introduction

What is a Page Server?