AlwaysOn failback after DR

Question

AlwaysOn failback after DR

jsqldba

SSChampion

Points: 11257
More actions
August 17, 2016 at 3:14 am

#329066

i'm not clear on something about the AlwaysOn failback.... say we failover to our DR site after an outage on our primary site. the outage lasts for days or weeks. when the primary site is recovered, the databases are now days or weeks behind the DR site. can the synchonization mode be set to sychronous for the DR replica, so that the transactions replicate back to the primary? Or must we do a full resync of the primary from the DR site? i'm talking about large databases here, over 1TB, what is the fastest way to get everything back to the primary site?
thanks

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Brandie Tarvin SSC Guru Points: 173119 More actions · Answer 1

JarJar (8/17/2016)
i'm not clear on something about the AlwaysOn failback.... say we failover to our DR site after an outage on our primary site. the outage lasts for days or weeks. when the primary site is recovered, the databases are now days or weeks behind the DR site. can the synchonization mode be set to sychronous for the DR replica, so that the transactions replicate back to the primary? Or must we do a full resync of the primary from the DR site? i'm talking about large databases here, over 1TB, what is the fastest way to get everything back to the primary site?
thanks

From what I understand, you will have to remove the databases from the Availability Group, then restore the most recent FULL backup with as many transaction logs as you can to the recovered primary site, then add the databases back to the Availability Group.

If you try to add the databases on the recovered primary site without the transaction logs or with using an older FULL backup, etc. the transaction logs on the current primary site will grow to an immense size while trying to commit data synchronously to the recovered primary site. It will take forever for VLDBs. So don't worry about synchronous or asynchronous. Just restore every backup you have on the recovered site (before flipping back), add the databases back to the Group, make sure to set your synchronous after that, wait for everything to be synched up, then flip back the group to the recovered site.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Perry Whittle SSC Guru Points: 234065 More actions · Answer 2

JarJar (8/17/2016)
i'm not clear on something about the AlwaysOn failback.... say we failover to our DR site after an outage on our primary site. the outage lasts for days or weeks.

Is the DR site set to synchronous or asynchronous?

JarJar (8/17/2016)
when the primary site is recovered, the databases are now days or weeks behind the DR site. can the synchonization mode be set to sychronous for the DR replica, so that the transactions replicate back to the primary?

Same as above is it synch or asynch to start with?

As Brandie has said the log will also grow rapidly if the outage lasts for any length of time. If you expect a long outage it would be best to remove the database from the availability group. As long as the removed database is left in restore mode and you have all log backups available a full resych wont be needed, just restores to bring the database up to the same recovery point as the new primary database

JarJar (8/17/2016)
Or must we do a full resync of the primary from the DR site? i'm talking about large databases here, over 1TB, what is the fastest way to get everything back to the primary site?
thanks

again is it synch or asynch to start with?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

jsqldba SSChampion Points: 11257 More actions · Answer 3

Perry Whittle (8/17/2016)
JarJar (8/17/2016)
i'm not clear on something about the AlwaysOn failback.... say we failover to our DR site after an outage on our primary site. the outage lasts for days or weeks.
Is the DR site set to synchronous or asynchronous?
JarJar (8/17/2016)
when the primary site is recovered, the databases are now days or weeks behind the DR site. can the synchonization mode be set to sychronous for the DR replica, so that the transactions replicate back to the primary?
Same as above is it synch or asynch to start with?
As Brandie has said the log will also grow rapidly if the outage lasts for any length of time. If you expect a long outage it would be best to remove the database from the availability group. As long as the removed database is left in restore mode and you have all log backups available a full resych wont be needed, just restores to bring the database up to the same recovery point as the new primary database
JarJar (8/17/2016)
Or must we do a full resync of the primary from the DR site? i'm talking about large databases here, over 1TB, what is the fastest way to get everything back to the primary site?
thanks
again is it synch or asynch to start with?

the DR site is asynchronous.

i think i got it now. this was a missing piece of the puzzle for me. if the db remains in the availability group then it continues to grow (no truncation) waiting for the AG to come back online. so the proper protocol in a true DR is to remove the db from the AG. then after recovery, backup /restore back to the primary site.

but for a DR test, which will only last a few hours at most, and not a huge amount of transaction during the test, i could set the DR replica to synchronous, do the test, then failback and reset it back to asynchronous. we won't be performance testing the DR site, just validating that failover works and functional tests.

do i have that correct?

Brandie Tarvin SSC Guru Points: 173119 More actions · Answer 4

JarJar (8/18/2016)
i think i got it now. this was a missing piece of the puzzle for me. if the db remains in the availability group then it continues to grow (no truncation) waiting for the AG to come back online. so the proper protocol in a true DR is to remove the db from the AG. then after recovery, backup /restore back to the primary site.
but for a DR test, which will only last a few hours at most, and not a huge amount of transaction during the test, i could set the DR replica to synchronous, do the test, then failback and reset it back to asynchronous. we won't be performance testing the DR site, just validating that failover works and functional tests.
do i have that correct?

Sorta yes, sorta no.

The AG itself will not go offline. It will move the primary to the DR site. The former primary (now secondary) will be offline for whatever reason (crash / DR exercise / maintenance).

If you're only doing a DR test, then nothing goes offline. All you need to do is manually shift the primary to the DR site (making the primary the secondary) then switch it back when you're done. I wouldn't worry about changing the synchronous mode unless you have a real need for your tests to be available ASAP on the regular primary. And if this is production, make sure your tests are using real data that you want left in the system or test accounts that are ignored in all your financial records.

EDIT: If this DR test is mainly about connectivity and the tests are few and small (data wise) then don't bother changing the synch mode at all. There won't be enough data to commit once you change over and most likely it will already be committed since the primary will still be up, albeit as a secondary. Unless... are you saying you plan on taking down the primary server completely as part of this test?? :END EDIT

If this was a real disaster, then the former primary would probably be down. I'm assuming the server or data center would be completely unavailable. But that doesn't mean the AG itself will be offline. It will still be functioning on the available servers. Or should be if it was set up in such a way that it could survive a data center crash (i.e., crossing multiple data centers).

Does that make sense?

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

jsqldba SSChampion Points: 11257 More actions · Answer 5

Brandie Tarvin (8/18/2016)

Does that make sense?

absolutely! Thanks a lot.

Brandie Tarvin SSC Guru Points: 173119 More actions · Answer 6

JarJar (8/18/2016)
Brandie Tarvin (8/18/2016)
Does that make sense?
absolutely! Thanks a lot.

You are welcome.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.