Introduction
In the last level, we explored how Page Servers let Azure SQL Hyperscale break free from traditional storage limits by separating compute from storage. That architecture opened doors for near-infinite scalability — but it also raised an important question: If the data lives somewhere else, wouldn’t every query suffer from slow fetches over the network?
That’s a fair concern. Imagine if every SELECT, every JOIN, every WHERE clause had to reach out across the wire to grab the underlying data pages — every single time. Latency would pile up. Your dashboards would crawl. Your apps would lag. Hyperscale would feel… well, not so “hyper.”
Microsoft anticipated this challenge. That’s why the Hyperscale team built something smart — and silent — to keep performance humming: a feature called RBPEX. RBPEX stands for Read Buffer Pool Extension, but don’t let the name scare you off. It’s essentially a local cache of hot data pages, built right into the compute node. It works behind the scenes to reduce round-trips, avoid unnecessary page pulls, and boost performance — without you lifting a finger.
In this level, we’re going to unpack what RBPEX is, how it works, and why it matters, especially in read-heavy scenarios like dashboards, reporting, and repetitive user queries. And we won’t stop at theory. We’ll simulate real workloads. Measure performance. Trigger cache hits and misses. Even peek into how SQL Server’s internals respond when RBPEX kicks in. By the end of this level, you’ll see RBPEX not just as a “bonus optimization,” but as a core reason Hyperscale feels fast — even when storage is remote.
What is RBPEX?
At its core, RBPEX (Read Buffer Pool Extension) is a smart, read-only caching layer that lives right inside the compute node of Azure SQL Hyperscale. Its job? To quietly and efficiently store frequently accessed 8KB data pages — so your queries don’t always have to reach out to remote Page Servers to get the data they need.
Let’s think of it this way: Imagine your compute node as a chef. The page servers are a pantry in the basement. Without RBPEX, every time the chef needs a tomato or onion (i.e., a data page), they’d have to run down the stairs, fetch it, and come back. That’s tiring — and slow.
RBPEX changes the game. It acts like a mini fridge right in the kitchen, stocked with the ingredients (data pages) the chef uses most often. This drastically reduces round-trips, lowers latency, and improves throughput — especially when queries tend to access the same data repeatedly. Technically, RBPEX operates as a hybrid between:
- SQL Server’s traditional buffer pool, which caches pages in memory, and
- A local SSD-based extension, orchestrated by the Hyperscale control plane
You don’t tune it. You don’t size it. You don’t enable or disable it. RBPEX is fully managed by Azure, and it’s always on — passively observing your query patterns, identifying “hot” pages, and caching them right where they're needed most. It’s particularly effective when your workload has temporal or spatial locality — meaning the same types of queries get repeated frequently, or adjacent data pages tend to be accessed together (as is common in index or range scans).
So if your Power BI dashboards hit the same sales report every 10 minutes, or your app keeps querying recent customer activity — RBPEX will kick in, making those queries feel almost instant. But remember: it’s read-only. It doesn’t accelerate writes, doesn’t persist across failovers, and won’t cache TempDB or log records. Still, for read-heavy, bursty, or repetitive workloads, RBPEX can quietly become the secret ingredient that keeps Hyperscale feeling fast — no matter how far away your storage actually lives.
Where It Sits in the Architecture
Here is a diagram that shows where this sites in the Hyperscale architecture. This shows:
- Client apps send queries to compute node
- Compute node first checks RBPEX cache
- If page is cached, it's served instantly
- If not, a request is sent to Page Server, and page is streamed and then cached in RBPEX
Figure 1: Illustration of RBPex Cache in Hyperscale Architecture(Image by Author)
RBPEX helps avoid repeated page pulls for every query. Think of it like a super-smart CDN sitting right inside your database engine.
What RBPEX Caches
RBPEX acts as a cache, and its goal is simple: keep the pages you need most right on the compute node so you are not waiting on page servers every time a query runs. The cache really shines when reads are repeated or when queries follow predictable patterns across nearby pages. That makes it a natural fit for workloads that are mostly reads with low churn, such as dashboards, lookup services, or OLAP-style queries. In practice, the cache fills up with the bread-and-butter 8KB table and index pages. If your queries keep hitting the same index leaf pages, like filtering on a recent date range, those pages stay in the cache instead of being pulled again from storage.
The cache also comes into play with more complex queries. During a big join, some intermediate results get reused multiple times. Without caching, the compute node would have to pull or recompute them each time. With RBPEX, those pages can remain in the cache, so later steps in the plan or repeated query runs finish much faster. Lookup tables are another perfect example. Small reference data sets such as country codes, product categories, or status values are queried constantly but hardly ever change. RBPEX keeps them in the cache so those lookups avoid round-trips to the page servers.
At the same time, RBPEX is careful about what it does not cache. Transaction log records never make it in because they are write-heavy and belong to a different subsystem. Inserts, updates, and deletes skip the cache as well, since it is strictly a read-only layer. TempDB pages are not cached either because TempDB already lives on the compute node’s SSD. And if you run a huge one-time analytical scan that touches millions of rows with no chance of reuse, RBPEX will not waste cache space on those pages. It only reacts to proven patterns, not guesses.
The key idea is that RBPEX is a selective cache. It does not try to hold everything, only the pages that deliver the most benefit when reused. When your workload has that classic 80/20 shape, where most queries hammer a small portion of data, the cache becomes a quiet performance hero. But when the workload is write-heavy, ad hoc, or constantly shifting, the cache simply steps aside and lets the system do its work.
Measuring RBPEX in Action: Real-World Simulations
RBPEX might sound like an under-the-hood optimization — but you can actually see it working if you know what to look for. In this section, we’re going beyond theory to run real-world simulations that mimic repeated reporting workloads — just like production dashboards or lookup-intensive apps.
Instead of benchmarks, we’ll create UltraFatData, a massive table filled with gigabytes of wide rows. It’s designed to exceed buffer pool and RBPEX cache so that the first query must pull pages from remote page servers. Then, as we repeatedly query the same subset, RBPEX begins caching them on local SSD — and physical reads drop while performance stabilizes.
At first, the query has to do cold reads from page servers. But as we repeat it, RBPEX gradually promotes those pages into the local SSD cache, which cuts down physical reads while logical reads stay the same.
To start, I created a wide table called UltraFatData, designed to be so large that it blows past both memory and the RBPEX cache. Here’s the DDL
DROP TABLE IF EXISTS dbo.UltraFatData; GO CREATE TABLE dbo.UltraFatData ( Id INT IDENTITY(1,1) PRIMARY KEY, Region VARCHAR(50), Payload1 CHAR(100), ); GO
Next, I populated the table with about a billion rows, with region values like East, West, and Central, to mimic a realistic workload. The insert loop looked like this:”
SET NOCOUNT ON; DECLARE @i INT = 0; WHILE @i < 100000 BEGIN WITH DataGen AS ( SELECT TOP (10000) CASE WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 7 = 0 THEN 'East' WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 5 = 0 THEN 'West' ELSE 'Central' END AS Region, REPLICATE('A', 100) AS Payload1, FROM sys.all_objects a CROSS JOIN sys.all_objects b ) INSERT INTO dbo.UltraFatData (Region, Payload1) SELECT TOP (10000) * FROM DataGen; SET @i += 1; PRINT CONCAT('Batch ', @i, ' inserted...'); END
After the data was ready, I flushed the buffer pool so the first query would start cold:”
-- Force cold read CHECKPOINT; DBCC DROPCLEANBUFFERS; GO -- Cold query SELECT SUM(TotalAmount) FROM dbo.UltraFatData WHERE Region = 'West';
On this first run, you can check the execution plan or use SET STATISTICS IO to see the number of logical reads, as well as query duration. For runtime metadata, sys.dm_exec_query_stats will also surface the details.
I then reran the same query ten times in a row. Logical reads stayed the same, but the number of physical reads began to drop sharply after the second or third run.
SELECT SUM(TotalAmount) FROM dbo.BigSales WHERE Region = 'West';
Track the reduction in logical reads from Query Store or sys.dm_exec_query_stats
over time.
To measure the effect more formally, I pulled stats from Query Store and sys.dm_exec_query_stats
. Here’s the query I used:
Query the Query Store or DMVs:
-- Track recent query stats SELECT qs.execution_count, qs.total_logical_reads/qs.execution_count AS avg_logical_reads, qs.total_physical_reads/qs.execution_count AS avg_physical_reads, qs.total_elapsed_time/qs.execution_count / 1000 AS avg_duration_ms, st.text AS query_text FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st WHERE st.text LIKE '%varchar%UltraFatData%' ORDER BY qs.creation_time DESC;
You should observe:
Metric | 1st Run | After 10 Runs |
---|---|---|
Logical Reads | Constant | Constant |
Physical Reads | High | ~0 (RBPEX kicks in) |
Duration (ms) | High | Drops drastically |
Please find the screenshots below showing how physical reads and durations decrease steadily over time, while logical reads remain constant across executions.
After First Execution:
After fourth Execution, there is a clear draft in physical reads from 4535 to 2702 and avg_duration of query execution from 229 to 191:
After 10th execution, there is a massive drop in both metrices
This behavior clearly demonstrates RBPEX (Read Buffer Pool Extension) in action. On the first execution, the data pages required for the query are not present in memory or SSD cache, so they’re fetched from remote page servers, resulting in high physical reads and longer query duration. However, as the same query is executed repeatedly — simulating a real-world workload like a dashboard or lookup-heavy report — RBPEX begins caching those pages on the local SSD at the compute node. This leads to a sharp drop in physical reads starting from the 2nd or 3rd execution, even though logical reads (i.e., the number of pages accessed) remain exactly the same. By the time you reach the 10th or 50th execution, most or all of the data is served from SSD, and eventually from memory, resulting in near-zero physical reads and stable, low-latency performance. This clearly highlights RBPEX’s role in bridging the gap between slow remote storage and fast in-memory access — without any manual tuning or explicit cache management.
When RBPEX Fails to Help
RBPEX won’t help in:
- Write-heavy workloads (OLTP inserts/updates)
- TempDB-intensive operations (sorts, hash joins)
- Large analytical scans with low page reuse
- First-time table scans (cold cache)
If your workload pattern changes too frequently, RBPEX may never warm up effectively.
What RBPEX Is Not
Let’s clear a few misconceptions:
- Not tunable (you can’t size it)
- Not visible like buffer pool memory
- Not persistent (cleared during failovers/restarts)
- Not designed for writes or transactional logs
It’s a read-only shadow layer, managed entirely by Azure behind the scenes.
Summary
RBPEX is what lets Hyperscale behave like a high-speed engine even with remote storage. Without it, every query would incur wire latency from page servers. With it, you get near-local read speeds, automatic caching of your hot data, and no config overhead. You don’t set it up. You don’t tune it. You just run your queries and let Hyperscale optimize silently.
So when your dashboards are fast, and your read queries fly? RBPEX is the unsung hero behind it.
In the next level, we’ll jump into the Log Service & Transaction Flow, and see how Hyperscale decouples durability, write performance, and replica consistency. That’s where things get really deep.