Always on issue

  • We have two nodes in our Always on setup running SQL 2014 SP1 with,just now, 1 database setup for always on.

    If I do a forced failover from Node 1 to Node 2 and then back again from Node 2 to Node one everything works perfectly.

    However, if I stop the SQL Server services on the Primary node the secondary node does not pick up. The dashboard says the role on the secondary gets stuck into resolving. As soon as I start the services on the Primary node it is all fine.

    Where do I even start to debug this issue?

    These are the message I see in the dashboard when it doesn't function:

    This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED

    The role of this availability replica is unhealthy. The replica does not have either the primary or secondary role.

    At least one availability database on this availability replica has an unhealthy data synchronization state. If this is an asynchronous-commit availability replica, all availability databases should be in the SYNCHRONIZING state. If this is a synchronous-commit availability replica, all availability databases should be in the SYNCHRONIZED state.

  • When you say all is fine after starting the primary what do you mean? What state is each database immediately after that point and which ends up primary?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Orlando Colamatteo (1/27/2016)


    When you say all is fine after starting the primary what do you mean? What state is each database immediately after that point and which ends up primary?

    On Server 1 the Always On database is the Primary, in the Dashboard all is fine. I then Stop the SQL Server Services on Server 1 and the Always on db is no longer available and the dashboard shows resolving and never switches over to the second server. Once I start SQL Server on Server 1 everything come available once again just fine.

  • Sounds like you need to look into your failover mode settings coupled with your sync mode.

    Also, from a data loss perspective are you running in async or synchronous mode and if you are async is your secondary caught up?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • It is set to Synchronous commit. There aren't any data changes as we only have a test database with a few test tables with less than 20 rows in it. Data isn't changing.

  • The replicas' being set to synchronous commit isn't sufficient. Both the primary and secondary have to have failover mode set to AUTOMATIC.

    What does this query show?

    SELECT replica_server_name,

    availability_mode_desc,

    failover_mode_desc

    FROM sys.availability_replicas

    Cheers!

  • Yes, it is set to failover mode Automatic.

    I found the issue and fixed it.

    Upon failover I changed it to the recommended setting in this link. Basically it means it will try to bring it online 60 times in a span of 1 hour... so... if it doesn't bring it online in 1 minute it will try to bring it online again.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply