Improving Availability Groups

  • Comments posted to this topic are about the item Improving Availability Groups

  • We have an issue during failover where our production databases will go into Initializing state.  It is random, not consistent.  We've done a lot of investigation, debugging and are stumped.  The systems are relatively quiet when we failover.  We have scripts that check the log send queue, redo queue, and AG health status.  The only way to get the databases out of this state is to kill the SQL Server process in task manager.  That's right, we have to kill the executable.  When we restart, the databases will go into restoring state (I think that is the state).  From there, we have to drop the database and add it back into the AG.  Anyone have a similar issue?  This only happens to our production servers not our DEV or QA AGs.  It's frustrating. Been trying to figure it out for well over a year.  Even MS and a consultant we worked with couldn't figure it out.

  • We did a test run of AG in our DEV environment. Getting 1.5+ TB of databases synced took a bit but it worked and using our cname for the DEV server to point at the listener meant we didn't need to change our connection strings in the apps. The big issues was, as you pointed out Steve, the jobs. We even added code to the jobs that would check if they were running on the primary or secondary and only run if they were on the primary. There were still lots of issues to figure out. We eventually abandoned the AG in DEV and are now looking into Azure SQL Managed Instances because of the included HA and the evergreen OS and SQL versions. We're stretched pretty thin at the moment and if we can get HA for free and remove some patching work from our lives, that's a plus. Again, getting 1.5+ TB synced into Azure will be an interesting process. I'm thinking backup to URL into BLOB storage and then restore onto the MI.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply