Replica stuck in resolving state during upgrade

  • Hi,

    Last week we tried to perform an upgrade on a 2 node sqlserver 2012 availability group. OS is windows server 2019.

    Cluster functional level is set to 9.

    We set secondary node to manual failover, assync commit, etc.. the secondary node was updated ok.

    On the primary, still sqlserver 2012, all was also good , all green ticks.

    I then tried a manual failover from node1 (SQL2012) to node2 (SQL2019)

    At that point both replicas went to resolving, and never came back.

    The error in the cluster managment stated that the IP of the AG was already in use.

    They were no duplication of IP on the network, we checked.

    Eventually we had to roll everything back.

    Q1) Should 'ALTER AVAILABILITY GROUP {MyAG} FORCE_FAILOVER_ALLOW_DATA_LOSS;' have enabled me to force the failover to the sql2019 node and continue upgrade, or would have made things worse?

    Q2) Would running the above command on Sql2012 node, bring applications back online?

    Q3) Has anybody any idea on how I could reproduce that error, I can test in non production environment, but previous upgrade attempts there, worked just fine.

    Thanks for any hints

     

     

     

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I am very curious why you decided to perform an in-place upgrade instead of starting with a fresh install and restore of the databases to a new server?  Especially with the complications of an AG in place.

    frederic.boistuaud wrote:

    Q1) Should 'ALTER AVAILABILITY GROUP {MyAG} FORCE_FAILOVER_ALLOW_DATA_LOSS;' have enabled me to force the failover to the sql2019 node and continue upgrade, or would have made things worse?

    Your AG was set to async.  If you did that, you may have lost data.  It would have brought the 2019 online, and the database should have been available.

    frederic.boistuaud wrote:

    Q2) Would running the above command on Sql2012 node, bring applications back online?

    Maybe.  Without a lot more information, there is no real way to determine if it would have worked.

    frederic.boistuaud wrote:

    Q3) Has anybody any idea on how I could reproduce that error, I can test in non production environment, but previous upgrade attempts there, worked just fine.

    If this worked as expected in a testing environment, then either steps were missed in the production try, or there were differences in the configuration between the environments.

    There are significant differences under the hood of SQL 2012 vs. SQL 2019.  I certainly would not do an upgrade from 2012 to 2019.  There are far too many unknowns.

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • If you like Upgrade in Place then you will love Russian Roulette. Both have the same level of randomness about when they will kill the subject.

    I would never do an upgrade in place on Production, the risks around failure are simply too high. As I would never want to do anything in Production that has not already been done in Test it means I would also never do upgrade in place in Test.

    With the ability to create VM guest instances whenever needed, there is really no justification for doing an upgrade in place.  Using a new Windows or Linux instance for a new SQL instance removes a shedload of uncertainties and risk from your upgrade process.

    I did once try out Upgrade in Place to see how it all worked and how reliable it was. I set up 4 clones of a Windows VM instance and did scripted SQL installs on each of them, to give 4 identical machines. I then did a scripted upgrade in place on each of them. Three upgrades worked, the fourth failed leaving the machine in an unusable state.  A 1 in 4 chance of hitting a problem during Upgrade in Place that would not exist in a new build is too high for me.

    Upgrade in Place may work OK in theory, but Einstein once said 'In theory, theory and practice are the same. In practice they are not.' This definitely applies to Upgrade in Place.

    I know all of this does not fix the OP problem, but the intention was to suggest an alternative approach would be safer.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply