HA/DR for Azure SQL Database

Azure SQL Database is a relational database-as-a-service in the cloud. It uses a special version of Microsoft SQL Server as its backend that is nearly identical to SQL Server (see Azure SQL Database Transact-SQL differences). While there are many benefits to using SQL Database over SQL Server, in this blog post I’ll talk about the various types of high-availability and disaster recovery options that are much easier to setup than SQL Server.

When you use the Azure portal to create a SQL Database, the various plans under the pricing tier include three service tiers: Basic, Standard, and Premium. Here are those three plans with their high-availability (HA) and disaster recovery (DR) options:

Basic: Automatic Backups, Point In Time Restore up to 7 days, Disaster recovery (DR): Geo-Restore, restore to any Azure region

Standard: Automatic Backups, Point In Time Restore up to 14 days, DR: Standard Geo-Replication, offline secondary

Premium: Automatic Backups, Point In Time Restore up to 35 days, DR: Active Geo-Replication, up to 4 online (readable) secondary backups

Here are more details on those options:

High Availability: Each database possesses one primary and two local replica databases stored on LRS Azure Blob Storage that reside in the same datacenter, providing high availability within that datacenter. At least two of those databases are synchronous. The hardware these databases reside on are on completely physically separate sub-systems. So if the hardware fails, your database will automatically and seamlessly fail over to the synchronous copy.

Automatic Backups: All Basic, Standard, and Premium databases are protected by automatic backups. Full backups are taken every week, differential backups every day, and log backups every 5 minutes. The first full backup is scheduled immediately after a database is created. Normally this completes within 30 minutes but it can take longer. If a database is “born big”, for example if it is created as the result of database copy or restore from a large database, then the first full backup may take longer to complete. After the first full backup all further backups are scheduled automatically and managed silently in the background. Exact timing of full and differential backups is determined by the system to balance overall load. Backup files are stored locally in blob storage in the same data center as your databases with local redundancy. When you restore a database, the required backup files are retrieved and applied. The full, differential, and log backups are also copied to the blob storage in the paired secondary region in the same geo-political area for disaster recovery purpose (RA-GRS). These geo-redundant copies are used to enable geo-restore as explained shortly.

Point In Time Restore: Point In Time Restore is designed to return your database to an earlier point in time. It uses the database backups, incremental backups and transaction log backups that the service automatically maintains for every user database. See Azure SQL Database Point in Time Restore. To restore a database, see Recover an Azure SQL Database from a user error. When you perform a restore, you’ll get a new database on the same server.

Geo-Restore: When you create a SQL Database server, you choose the region you want it in (i.e. East US), and this is your primary region. If there is an incident in this region and a database is unavailable, you can restore it from the geo-redundant backup copy in the secondary region to any region, using the same technology as point in time restore, and therefore the databases are resilient to the storage outages in the primary region. Note that with this option, your data could be up to one hour behind. See Azure SQL Database Geo-Restore.

Standard Geo-Replication: This is where a copy of your data in the primary database is constantly being written asynchronously to a non-readable secondary database on a server in a different region (geo-redundancy). In the event of a disaster you can fail over to the secondary. Since the copy is asynchronous the data in the secondary database will be behind the primary, but not by more than five seconds (you can make the copy synchronous by using the system procedure sp_wait_for_database_copy_sync). See Azure SQL Database Standard Geo-Replication.

Active Geo-Replication: Similar to Standard Geo-replication, your data is being asynchronously written except it’s on up to four secondary servers in different regions, and these secondaries are readable (each continuous copy is referred to as an called online secondary database). You can also fail over to a secondary in the event of disaster in the same way as Standard Geo-Replication. In addition, Active Geo-Replication can be used to support application upgrade or relocation scenarios without downtime, as well as load balancing for read-only workloads. See Active Geo-Replication for Azure SQL Database.

A word about database failover:

If a region has an extended outage you will receive an alert in the Azure Portal and will see your SQL Database servers’ state set to Degraded. At that point an application has a choice of initiating the failover or waiting for the datacenter to recover. If your application needs to optimize for higher availability and can tolerate a data loss of 5 seconds then it should failover as soon as you receive an alert or detect database connectivity failures. If your application is sensitive to data loss you may opt to wait for the SQL Database service to recover. If this happens no data loss will occur. In case you initiate the failover the database you must reconfigure your applications appropriately to connect to the new primary databases. Once you have completed the failover you will want to ensure that the new primary is also protected as soon as possible. Since primary region recovery may take time you will have to wait for your server to change from Degraded back to Online status. This will allow you to initiate geo-replication from the new primary to protect it. Until seeding of the new secondary is completed your new primary will remain unprotected.

More info:

Creating a large data warehouse in Azure

Business Continuity Overview

Design for business continuity

SQL Database Enable Geo Replication in Azure Portal

Fault-tolerance in Windows Azure SQL Database

Distributed Storage: How SQL Azure Replicas Work

High Availability and Disaster Recovery for Azure SQL Databases