Availability Group

Question

Availability Group

sqlguru

SSC-Addicted

Points: 471
More actions
March 23, 2021 at 5:28 pm

#3861333

Planning to setup AG group between 4 replica's and 3 are located in the same DC and a DR node in different DC.
Between Primary and SR1 it would be synchronous commit for Automatic failover and HA option.
Primary Replica - Full, Diff and Log backups & Maintenance Jobs etc.
Secondary Replica 1(Synchronous commit) - For HA and Automatic Failover
Secondary Replica 2 - (Asynchronous commit for Reporting purpose only)
Secondary Replica 3 (Asynchronous commit) - For DR
I will plan on testing the Automatic failover between Primary replica and SR1 to ensure the HA works as expected during the outage.
However, I have few questions during the maintenance Job run( for e.g. index rebuild or Check DBCC) let’s say if there is a failover happened due to different reasons would this cause any issues? If not, how this needs to be handled?
Do I need to be aware of anything else that needs to be tested for automatic failover? Other alternate option, I was thinking is synchronous mode with manual failover instead of Automatic, but that’s like every time during windows update someone has to do the failover manually. However, looking for the safest option. Please advise?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Jeffrey Williams SSC Guru Points: 90258 More actions · Answer 1

In an AG - you need to replicate logins and agent jobs to each secondary that can be used for failover. The logins must be created on the secondary with the same SID - so those logins will tie to the users in the database.

The agent jobs will need to be setup so they check that the AG group is primary - if not primary then exit - if primary then execute the job. For backups you will also need a central shared location to offload the backup files - just to keep the full set of backups together. You don't want to try to combine files from separate systems if you need to recover, especially if one of the systems has crashed and is non-recoverable.

The preferred methodology for patching is to patch the secondary nodes first - then manually failover to the secondary and patch the primary. Once the primary has been patched and is back up and available (and synchronized) - manually fail back to primary. For failover - the databases need to be synchronized and in synchronous mode or you could have data loss. If you set your HA secondary to manual failover you would then have to manually failover in the event of a system outage - which is not then an HA solution.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

sqlguru SSC-Addicted Points: 471 More actions · Answer 2

Thanks. I am looking for HA option with Automatic failover in the event of a system outage instead of manual failover. How about preferred methodology for patching if I go with Automatic failover route? Another concern with Automatic failover, during the DBCC checkdb run, let’s say if dbcc checkdb takes few hours to finish for very large database and there is a failover happened before the job complete due to outage, would this cause any issue to the db?

Steve Jones - SSC Editor SSC Guru Points: 734587 More actions · Answer 3

Patching: https://www.brentozar.com/archive/2015/02/patching-sql-server-availability-groups/

For DBCC, if the primary fails, the DBCC work is lost. This is an internal snapshot of the db. Losing this doesn't hurt, but the DBSS work is gone.

Jeffrey Williams SSC Guru Points: 90258 More actions · Answer 4

The failover mode doesn't matter - what matters is the mode. If the mode is asynchronous - you must switch to synchronous before failing over, or you force the failover which can lead to data loss. If the mode is synchronous then either automatic or manual failover can be done without data loss - because the secondary must be up to date prior to any data being committed on the primary.

If you want an HA solution - then you must configure the secondary as synchronous with automatic failover. Anything else is NOT an HA solution as it would require manual intervention to fail over - which by definition is not HA.

Preferred methodology for patching: patch secondary, failover, patch primary - synchronize databases and fail back if needed. This reduces downtimes to just the amount of time it takes to failover instead of having to wait for the server to restart.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Michael L John One Orange Chip Points: 27045 More actions · Answer 5

sqlguru wrote:

Thanks. I am looking for HA option with Automatic failover in the event of a system outage instead of manual failover. How about preferred methodology for patching if I go with Automatic failover route? Another concern with Automatic failover, during the DBCC checkdb run, let’s say if dbcc checkdb takes few hours to finish for very large database and there is a failover happened before the job complete due to outage, would this cause any issue to the db?

In your scenario, I think you want the three servers in the same DC set up as synchronous, and at least one of them set up as automatic failover. Leave the DR as asynchronous, with manual failover. You also want to set up and leverage read-only routing to take advantage of the read-only secondaries.

In the event of a failover, any SQL jobs that may be running things like CHECKDB will be disrupted. You need to plan for that. We do regular patching, the servers get re-booted in a specific order. We expect that there will be an interruption, and have taken steps accordingly.

We have 3 servers, 2 in the same DC, one in a DR data center. The 2 in the same DC are synchronous and failover automatically, the DR is manual and async. Read-only routing is set up. We have Azure blob storage set up as a witness.

Patching is on a regular schedule that we know in advance for the next million years. Patching is done on Saturdays between 3 and 8 am. At 2 am, every Saturday, a job runs and checks a calendar table for a "patch today" value. If it's a patch day, it disables all of the alerts. All of the SQL jobs check if the server is a primary, and skips each job if it is not the primary. In non-prod, I have coded this to delay the start of the various jobs that need to run. I haven't implemented it in production yet, but I plan on doing so within the next few weeks.

We backup only the primary server, although we backup all system databases on all servers. We run checkdb on all instances. Backups in the DR data center server are run to a local drive, the primary DC go to a share. Normally, the DR box copies it's backups to the file share, although in the event of DR, that is stopped. All of this is automatic. There is no manual intervention.

In the event of a real DR, there are manual steps to force quorum and bring the listeners online. When we failback, I set all of the servers to synchronous, and when the data is in sync, we fail over.

In our annual DR tests, databases are ready to go in about a minute. We had a situation in January where we had to go to the DR. The DBA got some atta-boys because it worked perfectly and in no time.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

sqlguru SSC-Addicted Points: 471 More actions · Answer 6

Thanks a lot! I really appreciate to everyone and it's very helpful.

Question regarding data synchronization. Actually we are migrating data from different database system to SQL Server which would be approx 3 - 4TB in Size and that would be our primary replica. Regarding data synchronization, I know there are different choices include full database and log backup, join only, or skip initial data synchronization, so I am thinking to create a empty user database first with schema on primary replica and then configure availability group so that the migrated data on Primary replica will get synchronized to all secondary replica's rather than full database and log backup option. This will also become a use case for testing the latency for high volume of data export into primary replica using BCP to ensure that data synchronization is handled properly to all secondary replica's. Any thoughts?

Michael L John One Orange Chip Points: 27045 More actions · Answer 7

sqlguru wrote:

Thanks a lot! I really appreciate to everyone and it's very helpful.
Question regarding data synchronization. Actually we are migrating data from different database system to SQL Server which would be approx 3 - 4TB in Size and that would be our primary replica. Regarding data synchronization, I know there are different choices include full database and log backup, join only, or skip initial data synchronization, so I am thinking to create a empty user database first with schema on primary replica and then configure availability group so that the migrated data on Primary replica will get synchronized to all secondary replica's rather than full database and log backup option. This will also become a use case for testing the latency for high volume of data export into primary replica using BCP to ensure that data synchronization is handled properly to all secondary replica's. Any thoughts?

I'm not sure that will work actually.

As far as I know, the synchronization is dependent upon the logs and LSN's to get in sync. The latency on the initial load will likely be higher than the normal day to day syncing.

Restoring a 3-4 TB database, and applying differential and log backups, shouldn't take that long. Are you worried about the time it takes to do the initial load?

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Jeffrey Williams SSC Guru Points: 90258 More actions · Answer 8

If you create the new database in SQL Server - setup the secondary replicas, then start loading data from your other system, that would work. But it is going to depend on the process you use to load the data...

If you load the data in large batches - that data has to be transferred to the replicas through the send queue and applied to the replicas in the redo queue. The larger the batches the longer this will take and could get backed up to a point where it will take longer to send/redo the transactions that a simple backup/restore will take.

If you choose to go this route - make sure the replicas are set to asynchronous mode during the load process. If not - then the load process will have to wait until the transactions have been hardened on the replicas before committing on the primary which will slow down the process.

Also make sure you have at least 4 or 5 TB of space available for the transaction log on all replicas. The transaction log cannot be marked as reusable (truncated) until transactions have committed on all replicas - and backing up the transaction log won't do that if the send and/or redo queues are backed up.

You will also need space in tempdb for the version store - not sure how much will be needed but if you get backed up on the send/redo queues it could be quite a lot of space.

I would not approach it that way - I would setup the secondary replicas after building out the primary system using backup/restore and join.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

sqlguru SSC-Addicted Points: 471 More actions · Answer 9

I have setup the availability group as below.

Primary Replica - Full, Diff and Log backups & Maintenance Jobs etc.

Secondary Replica 1(Synchronous commit) - For HA and Automatic Failover

Secondary Replica 2 - (Asynchronous commit for Reporting purpose only)

Secondary Replica 3 (Asynchronous commit) - For DR

I am planning to sync logins once a day between Primary and secondary replica 1(Synchronous commit for HA and automatic failover) Would that be suffice? Regarding the secondary replica2(for reporting purposes only) we have separate logins to access reporting server which are not needed on primary. So I would need to add a login to the primary instance , create a user in the primary database replica. This gets transferred to the secondary replica(s) in the AG. Then I create a login on the secondary replica2(reporting server) the user maps to the login automatically and then I will disable the user on the primary. So do you advise to create this login on both Primary replica and secondary replica 1 and disable it?

Jeffrey Williams SSC Guru Points: 90258 More actions · Answer 10

You need to replicate the logins to replica 1 & 3 (DR) - those logins will be needed on the DR node if you ever fail over. For reporting logins (note: these are SQL logins only) I would create the login on replica 2 - grab the SID from that replica and create the login on primary, create the user is the specific databases needed then remove the login from the primary.

If you leave the login on the primary - then you have to filter out that login from being replicated to the other nodes. Once the database user has been created with the appropriate SID it isn't needed on the primary. And - if for some reason - you need to recreate that login you can always grab the SID from the secondary and recreate it...but that shouldn't be needed.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

sqlguru SSC-Addicted Points: 471 More actions · Answer 11

Using listener there are timeouts from application. Does increasing the session timeout is advisable from availability group properties?

sqlguru SSC-Addicted Points: 471 More actions · Answer 12

Connecting to server works but when connecting using listener it times out. Any thoughts? Thanks in Advance!