Interesting Aurora MySQL Feature: The Buffer Pool Survives Restarts

Brent Reading Book
“Documentation! Hey, look at that.”

Lemme start this off by saying this is probably irrelevant to you. (It’s irrelevant to me, too.)

If you’re strapped for time, just skip past this blog post.

This one’s for the curious folks.

AWS Aurora MySQL is Amazon’s flavor of MySQL with their own unique performance and reliability improvements. I’ve never used it, and I don’t see myself using it anytime soon because I just don’t need it. The database back end for the BrentOzar.com blog is MySQL, but I use a managed WordPress hosting partner, so Aurora MySQL is irrelevant there too.

Having said that, I still read the database news because it’s neat to see how companies are innovating, and this new optimization from AWS is intriguing:

The current implementation of buffer pool in Aurora MySQL employs survivable page cache where each database instance’s page cache is managed in a separate process from the database, which allows the page cache to survive independently of the database.

<record scratch> WAT

This is obviously dramatically different from Microsoft SQL Server. In SQL Server, if you restart the SQL Server process:

  • Dirty (changed) buffer pool pages are written to disk
  • The SQL Server process shuts down, releasing all memory back to the OS
  • The SQL Server process starts again, and has no memory allocated at the beginning (unless you play around with LPIM and minimum server memory settings)
  • SQL Server gradually requests memory from the OS as needed, reading data pages up from disk as needed, and caching those pages in the buffer pool

At first glance, Aurora MySQL’s optimization sounds amazing, but it has a few gotchas. It would seem to only be relevant when:

  • The MySQL writeable replica stays on the same server – meaning I would assume it’s less relevant for database patching, since you’d want to patch a passive replica first, then fail over to it. (Although as long as Amazon’s putting in this much work, they could conceivably do the patching live on the same node – I would assume that would result in longer downtime though, as opposed to failing over to an already-patched instance.)
  • The MySQL process restarts, but the OS stays up – meaning it’s not relevant for OS patching either.
  • The buffer pool is fairly stable – this doesn’t help you on underpowered servers where everything gets read from disk anyway.
  • And keep in mind that we’re only talking about the page cache, not things like execution plans, DMV metrics, etc.

This isn’t the only optimization they’ve done, of course. The whole documentation section on Aurora storage and reliability is interesting, like how storage costs automatically drop as you drop tables and indexes. You don’t have to worry about resizing the data files or resizing the underlying OS volumes like you do with Azure SQL DB or conventional SQL Servers.

I’m not saying Aurora MySQL is better than Azure SQL DB or SQL Server, by any means. (I’m not even saying the optimization works, hahaha!) I’m not even saying Microsoft should build this kind of cache persistence for SQL Server! It’s such a niche use case. I’m just saying it’s interesting to see these kinds of advancements in cloud databases.

Previous Post
[Video] Office Hours: Sunshine Edition
Next Post
[Video] Office Hours: Wrong Side of the Bed Edition

11 Comments. Leave new

  • Thanks, I did not read this post.

    Reply
  • What hosting provider do you use for WordPress?

    A constantly recurring problem I have had in my career across several jobs, have been quickly thought-out solutions implemented on WordPress that start out satisfying all needs but then as it grows, performance deteriorates down to unusable if allowed to grow large enough. This site doesn’t have that problem but is more than large enough for what I would expect a WordPress site to have deteriorated into complete lethargy and then have to endure providing the repeated explanations that more disk, CPU and RAM can’t help it because not only are none of them heavily utilized, adding more of any of the above, demonstrates it doesn’t solve the problem.

    I am curious if your provider has some sort of optimizations in place that allow word press to scale better – or if my problems have been my solutions ‘designers’ skipping WordPress enterprise to use the free version instead.

    Reply
  • “you’d want to patch a passive replica first, then fail over to it”

    Yeah, I wish that was the case… you actually have to patch the writer node first and then proceed with the reader node/s meaning “no downtime” patching is not possible without using external replication.

    Reply
  • How did you get to be so smart?

    Reply
  • Clippy, the ultimate cosmic connoisseur, would throw the galaxy’s biggest celebration if SQL Server could work such a magic on proc-cache and in-memory stats like index operational stats/usage stats. It’s the kind of party even the most advanced extraterrestrial beings would RSVP to!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.