Stairway to Azure SQL Hyperscale – Level 3: RBPEX Caching Layer Internals

Introduction

In the last level, we explored how Page Servers let Azure SQL Hyperscale break free from traditional storage limits by separating compute from storage. That architecture opened doors for near-infinite scalability — but it also raised an important question: If the data lives somewhere else, wouldn’t every query suffer from slow fetches over the network?

That’s a fair concern. Imagine if every SELECT, every JOIN, every WHERE clause had to reach out across the wire to grab the underlying data pages — every single time. Latency would pile up. Your dashboards would crawl. Your apps would lag. Hyperscale would feel… well, not so “hyper.”

Microsoft anticipated this challenge. That’s why the Hyperscale team built something smart — and silent — to keep performance humming: a feature called RBPEX. RBPEX stands for Read Buffer Pool Extension, but don’t let the name scare you off. It’s essentially a local cache of hot data pages, built right into the compute node. It works behind the scenes to reduce round-trips, avoid unnecessary page pulls, and boost performance — without you lifting a finger.

In this level, we’re going to unpack what RBPEX is, how it works, and why it matters, especially in read-heavy scenarios like dashboards, reporting, and repetitive user queries. And we won’t stop at theory. We’ll simulate real workloads. Measure performance. Trigger cache hits and misses. Even peek into how SQL Server’s internals respond when RBPEX kicks in. By the end of this level, you’ll see RBPEX not just as a “bonus optimization,” but as a core reason Hyperscale feels fast — even when storage is remote.

What is RBPEX?

At its core, RBPEX (Read Buffer Pool Extension) is a smart, read-only caching layer that lives right inside the compute node of Azure SQL Hyperscale. Its job? To quietly and efficiently store frequently accessed 8KB data pages — so your queries don’t always have to reach out to remote Page Servers to get the data they need.

Let’s think of it this way: Imagine your compute node as a chef. The page servers are a pantry in the basement. Without RBPEX, every time the chef needs a tomato or onion (i.e., a data page), they’d have to run down the stairs, fetch it, and come back. That’s tiring — and slow.

RBPEX changes the game. It acts like a mini fridge right in the kitchen, stocked with the ingredients (data pages) the chef uses most often. This drastically reduces round-trips, lowers latency, and improves throughput — especially when queries tend to access the same data repeatedly. Technically, RBPEX operates as a hybrid between:

SQL Server’s traditional buffer pool, which caches pages in memory, and
A local SSD-based extension, orchestrated by the Hyperscale control plane

You don’t tune it. You don’t size it. You don’t enable or disable it. RBPEX is fully managed by Azure, and it’s always on — passively observing your query patterns, identifying “hot” pages, and caching them right where they're needed most. It’s particularly effective when your workload has temporal or spatial locality — meaning the same types of queries get repeated frequently, or adjacent data pages tend to be accessed together (as is common in index or range scans).

So if your Power BI dashboards hit the same sales report every 10 minutes, or your app keeps querying recent customer activity — RBPEX will kick in, making those queries feel almost instant. But remember: it’s read-only. It doesn’t accelerate writes, doesn’t persist across failovers, and won’t cache TempDB or log records. Still, for read-heavy, bursty, or repetitive workloads, RBPEX can quietly become the secret ingredient that keeps Hyperscale feeling fast — no matter how far away your storage actually lives.

Where It Sits in the Architecture

Here is a diagram that shows where this sites in the Hyperscale architecture. This shows:

Client apps send queries to compute node
Compute node first checks RBPEX cache
If page is cached, it's served instantly
If not, a request is sent to Page Server, and page is streamed and then cached in RBPEX

Figure 1: Illustration of RBPex Cache in Hyperscale Architecture(Image by Author)

RBPEX helps avoid repeated page pulls for every query. Think of it like a super-smart CDN sitting right inside your database engine.

What RBPEX Caches

RBPEX acts as a cache, and its goal is simple: keep the pages you need most right on the compute node so you are not waiting on page servers every time a query runs. The cache really shines when reads are repeated or when queries follow predictable patterns across nearby pages. That makes it a natural fit for workloads that are mostly reads with low churn, such as dashboards, lookup services, or OLAP-style queries. In practice, the cache fills up with the bread-and-butter 8KB table and index pages. If your queries keep hitting the same index leaf pages, like filtering on a recent date range, those pages stay in the cache instead of being pulled again from storage.

The cache also comes into play with more complex queries. During a big join, some intermediate results get reused multiple times. Without caching, the compute node would have to pull or recompute them each time. With RBPEX, those pages can remain in the cache, so later steps in the plan or repeated query runs finish much faster. Lookup tables are another perfect example. Small reference data sets such as country codes, product categories, or status values are queried constantly but hardly ever change. RBPEX keeps them in the cache so those lookups avoid round-trips to the page servers.

At the same time, RBPEX is careful about what it does not cache. Transaction log records never make it in because they are write-heavy and belong to a different subsystem. Inserts, updates, and deletes skip the cache as well, since it is strictly a read-only layer. TempDB pages are not cached either because TempDB already lives on the compute node’s SSD. And if you run a huge one-time analytical scan that touches millions of rows with no chance of reuse, RBPEX will not waste cache space on those pages. It only reacts to proven patterns, not guesses.

The key idea is that RBPEX is a selective cache. It does not try to hold everything, only the pages that deliver the most benefit when reused. When your workload has that classic 80/20 shape, where most queries hammer a small portion of data, the cache becomes a quiet performance hero. But when the workload is write-heavy, ad hoc, or constantly shifting, the cache simply steps aside and lets the system do its work.

Measuring RBPEX in Action: Real-World Simulations

RBPEX might sound like an under-the-hood optimization — but you can actually see it working if you know what to look for. In this section, we’re going beyond theory to run real-world simulations that mimic repeated reporting workloads — just like production dashboards or lookup-intensive apps.

Instead of benchmarks, we’ll create UltraFatData, a massive table filled with gigabytes of wide rows. It’s designed to exceed buffer pool and RBPEX cache so that the first query must pull pages from remote page servers. Then, as we repeatedly query the same subset, RBPEX begins caching them on local SSD — and physical reads drop while performance stabilizes.

At first, the query has to do cold reads from page servers. But as we repeat it, RBPEX gradually promotes those pages into the local SSD cache, which cuts down physical reads while logical reads stay the same.

To start, I created a wide table called UltraFatData, designed to be so large that it blows past both memory and the RBPEX cache. Here’s the DDL

DROP TABLE IF EXISTS dbo.UltraFatData;
GO

CREATE TABLE dbo.UltraFatData
(
    Id INT IDENTITY(1,1) PRIMARY KEY,
    Region VARCHAR(50),
    Payload1 CHAR(100),
);
GO

Next, I populated the table with about a billion rows, with region values like East, West, and Central, to mimic a realistic workload. The insert loop looked like this:”

SET NOCOUNT ON;
DECLARE @i INT = 0;

WHILE @i < 100000
BEGIN
    WITH DataGen AS (
        SELECT TOP (10000)
            CASE 
                WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 7 = 0 THEN 'East'
                WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 5 = 0 THEN 'West'
                ELSE 'Central'
            END AS Region,
            REPLICATE('A', 100) AS Payload1,
        FROM sys.all_objects a
        CROSS JOIN sys.all_objects b
    )
    INSERT INTO dbo.UltraFatData (Region, Payload1)
    SELECT TOP (10000) * FROM DataGen;

    SET @i += 1;
    PRINT CONCAT('Batch ', @i, ' inserted...');
END

After the data was ready, I flushed the buffer pool so the first query would start cold:”

-- Force cold read
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO

-- Cold query
SELECT SUM(TotalAmount) 
FROM dbo.UltraFatData 
WHERE Region = 'West';

On this first run, you can check the execution plan or use SET STATISTICS IO to see the number of logical reads, as well as query duration. For runtime metadata, sys.dm_exec_query_stats will also surface the details.

I then reran the same query ten times in a row. Logical reads stayed the same, but the number of physical reads began to drop sharply after the second or third run.

    SELECT SUM(TotalAmount) 
    FROM dbo.BigSales 
    WHERE Region = 'West';

Track the reduction in logical reads from Query Store or sys.dm_exec_query_stats over time.

To measure the effect more formally, I pulled stats from Query Store and sys.dm_exec_query_stats. Here’s the query I used:

Query the Query Store or DMVs:

-- Track recent query stats
SELECT 
    qs.execution_count,
    qs.total_logical_reads/qs.execution_count AS avg_logical_reads,
    qs.total_physical_reads/qs.execution_count AS avg_physical_reads,
    qs.total_elapsed_time/qs.execution_count / 1000 AS avg_duration_ms,
    st.text AS query_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
WHERE st.text LIKE '%varchar%UltraFatData%'
ORDER BY qs.creation_time DESC;

You should observe:

Metric	1st Run	After 10 Runs
Logical Reads	Constant	Constant
Physical Reads	High	~0 (RBPEX kicks in)
Duration (ms)	High	Drops drastically

Please find the screenshots below showing how physical reads and durations decrease steadily over time, while logical reads remain constant across executions.

After First Execution:

After fourth Execution, there is a clear draft in physical reads from 4535 to 2702 and avg_duration of query execution from 229 to 191:

After 10th execution, there is a massive drop in both metrices

This behavior clearly demonstrates RBPEX (Read Buffer Pool Extension) in action. On the first execution, the data pages required for the query are not present in memory or SSD cache, so they’re fetched from remote page servers, resulting in high physical reads and longer query duration. However, as the same query is executed repeatedly — simulating a real-world workload like a dashboard or lookup-heavy report — RBPEX begins caching those pages on the local SSD at the compute node. This leads to a sharp drop in physical reads starting from the 2nd or 3rd execution, even though logical reads (i.e., the number of pages accessed) remain exactly the same. By the time you reach the 10th or 50th execution, most or all of the data is served from SSD, and eventually from memory, resulting in near-zero physical reads and stable, low-latency performance. This clearly highlights RBPEX’s role in bridging the gap between slow remote storage and fast in-memory access — without any manual tuning or explicit cache management.

When RBPEX Fails to Help

RBPEX won’t help in:

Write-heavy workloads (OLTP inserts/updates)
TempDB-intensive operations (sorts, hash joins)
Large analytical scans with low page reuse
First-time table scans (cold cache)

If your workload pattern changes too frequently, RBPEX may never warm up effectively.

What RBPEX Is Not

Let’s clear a few misconceptions:

Not tunable (you can’t size it)
Not visible like buffer pool memory
Not persistent (cleared during failovers/restarts)
Not designed for writes or transactional logs

It’s a read-only shadow layer, managed entirely by Azure behind the scenes.

Summary

RBPEX is what lets Hyperscale behave like a high-speed engine even with remote storage. Without it, every query would incur wire latency from page servers. With it, you get near-local read speeds, automatic caching of your hot data, and no config overhead. You don’t set it up. You don’t tune it. You just run your queries and let Hyperscale optimize silently.

So when your dashboards are fast, and your read queries fly? RBPEX is the unsung hero behind it.

In the next level, we’ll jump into the Log Service & Transaction Flow, and see how Hyperscale decouples durability, write performance, and replica consistency. That’s where things get really deep.

Stairway to Azure SQL Hyperscale – Level 3: RBPEX Caching Layer Internals

Introduction

What is RBPEX?

Where It Sits in the Architecture

What RBPEX Caches

Measuring RBPEX in Action: Real-World Simulations

When RBPEX Fails to Help

What RBPEX Is Not

Summary

Rate

Share

Share

Rate

Stairway to Azure SQL Hyperscale – Level 3: RBPEX Caching Layer Internals

Introduction

What is RBPEX?

Where It Sits in the Architecture

What RBPEX Caches

Measuring RBPEX in Action: Real-World Simulations

When RBPEX Fails to Help

What RBPEX Is Not

Summary

Rate

Share

Share

Rate

Related content

How Implicit Transactions Hurt SQL Server Performance Without You Knowing

Deprecated but Forgotten: Why SQL Server’s Text, NText, and Image Data Types Still Haunt Your Systems

Advanced SQL Server Page Forensics: Detecting Page Splits and Allocations with DBCC PAGE

How to safely and surgically restore filegroups

Unmasking CXPACKET and CXCONSUMER in SQL Server: What Your Execution Plan Isn’t Telling You