AlwaysOn failover with domain users and groups

  • Hi all, I'm new here but cerainly not new to sql server. Originally I'm a programmer (vb.net, c#, sql mainly) but more and more I am being moved by the organisation towards dba. Had a couple of sql dba courses with microsoft and I'm doing pretty okay on the basic tasks. So excuses me if this is absolute noob stuff that should've known. I searched the forum but hard to find the exact answer. So here goes:

    We have a SQL Server 2012 cluster with two nodes and my question (problem) is about failover from one node to the other. We have a certain database that had only domainusers and domaingroups as logins and works fine (fully accessible). But when the availability group fails over to the other node the database is no longer accessable.

    I know that when it comes to users and login that the SID must be the same on both nodes. But in this case (only domain users and groups) there's no SID that matters here. Is that correct?

    So, if the issue is not in the SIDs, what else could be the problem? Who can help me?

    Sorry, if I did not provide enough info. Please ask me and I will provide it.

    Thanks in advance for your help!

  • What does 'inaccessible' mean?

    After the failover, is the database online and available on the second instance?

    If the database isn't online, then there's one troubleshooting path.

    Have those domain users and groups been granted proper access to the second instance? AG isn't Windows Clustering - each instance is separate and distinct and must be configured separately. This includes creating instance logins for all database users.

    Can those users log into the second instance and access the database directly?

    If it's a login issue, that will show up in the SQL Server Error Log as error 18456, and will include an additional State value. You can look up what that means here:

    https://msdn.microsoft.com/en-us/library/cc645917%28v=sql.110%29.aspx

    What errors are users seeing?

    -Eddie

    Eddie Wuerch
    MCM: SQL

  • Thank you for your input Eddie. Tonight I have a look in the errorlog and do some more testing. I'll be back with the results asap.

  • Super_Grover (5/26/2016)


    Hi all, I'm new here but cerainly not new to sql server. Originally I'm a programmer (vb.net, c#, sql mainly) but more and more I am being moved by the organisation towards dba. Had a couple of sql dba courses with microsoft and I'm doing pretty okay on the basic tasks. So excuses me if this is absolute noob stuff that should've known. I searched the forum but hard to find the exact answer. So here goes:

    We have a SQL Server 2012 cluster with two nodes and my question (problem) is about failover from one node to the other. We have a certain database that had only domainusers and domaingroups as logins and works fine (fully accessible). But when the availability group fails over to the other node the database is no longer accessable.

    I know that when it comes to users and login that the SID must be the same on both nodes. But in this case (only domain users and groups) there's no SID that matters here. Is that correct?

    So, if the issue is not in the SIDs, what else could be the problem? Who can help me?

    Sorry, if I did not provide enough info. Please ask me and I will provide it.

    Thanks in advance for your help!

    The windows logins will need to be provisioned on all instances participating in the AG, have you don't this?

    SIDs will automatically be taken care of, no need to worry about orphan users

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Just to be clear this is an AG you are talking about, correct? You list AlwaysOn, but only state a SQL Server cluster in your original question. That could be an AlwaysOn failover cluster instance (FCI) or an AlwaysOn Availability Group (AG).

    If it is an AG I totally agree with others' comments (and that would make sense based on the symptoms listed); the Windows logins need to be added on each node that is a part of the AG.

    Joie Andrew
    "Since 1982"

  • definitely states AG

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Hi all, sorry for the delay but I did some testing and checking the logfile.

    Yes, I'm talking about Availability Groups. Sorry for the confusion.

    I, have this AG for testing and it's called 'Test_overfail' 🙂 and it contains just two databases. The DB in question here is called 'SALTO_RW'

    I manually let this AG fail over to the other node and the the DB is up and running and fully accessable in SQL Management Studio. Logins and users and permissions are the same.

    But then I try to start the client application and thus try to connect to the DB through the client the DB is not available for the client. On the other (initial) node all works fine.

    So in the error log (and on the application screen) it throws an 976 error. It says it's not enabled for read access. I looked up some properties but am not sure where to look exactly.

    I posted the logfile (from the moment of failover till failing of the application) as an attachment.

    Hope you can help me; thanks in advance!

  • Maybe it helps, maybe not but here part of the log on failover back to the working node. See attachment.

  • When you are able to successfully connect to the initial node, are you connecting to the node name, or the AG listener? Same question for the node you are having problems with.

    Can you provide the error message you are getting when you are failing to connect to the node you are having problems with?

    Joie Andrew
    "Since 1982"

  • I'm connecting to the listener name, but there's someting weird in the config there. I think it's wise for me to have a chat my sysadmin tomorrow morning to clarify and sort out some stuff.

    I'll be back here with the outcome

  • at a guess I would say the application is connecting to the primary instance, when failover occurs this is when it breaks.

    Confirm the client connection string please

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Yes, you're right Perry. Connection with DB breaks on failover.

    Turned out that the IP config of the listener was not correct. I corrected it yesterday but apparently it needed quite some time for dns to process through the network. It works fine now on failover, so my initial thoughts about this being a sid issue was not correct.

    Point is, I'm not too good on the network/IP stuff so it's not fully clear to me how the listener work/translates IP addresses upon failover. Anyone know a good link/site/video that explains this in an easy way (maybe with graphics)?

  • The IP that the listener is configured for is a clustered resource. So there will be a NIC on your cluster nodes on the same subnet as the IP the listener is using. When a node is the owner of the listener the NIC on that subnet is listening on the listener IP as well as the IP it is primarily configured for.

    When a failover occurs WSFC moves ownership of the IP from the node that was the owner to the node it fails over to. So the NIC stops responding to traffic on that IP and the node that the AG failed over to starts accepting traffic for that IP.

    I think things get more complicated with multi-subnet failovers, but that should be the gist of it.

    Joie Andrew
    "Since 1982"

  • Joie Andrew (6/2/2016)


    When a node is the owner of the listener the NIC on that subnet is listening on the listener IP as well as the IP it is primarily configured for.

    Correct, this is known as stacking IP addresses

    Joie Andrew (6/2/2016)


    I think things get more complicated with multi-subnet failovers, but that should be the gist of it.

    similar to above except that each node will have an IP in a different subnet, there will also be a unique stackable IP in each subnet.

    The cluster resource has logic applied to detect which node and subnet is owner in order to bring online the correct stackable IP

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply