AOAG endpoint won't connect

  • Hi,

    i'm not able to change the status of my hadr_endpoint do "connected". It always show the status "disconnected".

    I droped the endpoint, changed the endpoint owner, and changed the connect permission to the user, who starts the sql servives.

    But nothing change.

    Had someone any ideas, what to do? I won't setup the AOAG from scretch, but not, nothing will be replicated to the second node.

    Script to check endpoint status:

    SELECT r.replica_server_name AS active_AOAG_Node,
    r.endpoint_url,
    rs.connected_state_desc
    FROM sys.dm_hadr_availability_replica_states rs
    JOIN sys.availability_replicas r ON rs.replica_id=r.replica_id
    WHERE rs.is_local=1

    Script to drop, create, and permission change:

    -- drop and create endpoint hadr_endpoint
    USE [master]
    DROP ENDPOINT [Hadr_endpoint]

    CREATE ENDPOINT [Hadr_endpoint]
    STATE=STARTED
    AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL)
    FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE
    , ENCRYPTION = REQUIRED ALGORITHM AES)
    GO


    -- owner des neuen endpoints ändern:
    USE master;
    ALTER AUTHORIZATION ON ENDPOINT::Hadr_endpoint TO sa



    -- berechtigen des Users mit connect auf den endpoint
    use [master]
    GO
    GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [Domain1\Service_Start_User]
    GO

     

    Thanks,

    Kind regards,

    Andreas

  • Did you check the SQL Server error logs on the instances for any related errors? Do all of the instances have connect permissions? Is the firewall port open (default of 5022) ?

    Sue

  • Hi Sue,

    Ports are open, we checked this with powershell. In the ERRORLOG, nothing looks unusal.

    And only one of the 2 nodes won't connect to the endpoint.

    We don't know what we can do further, to resolve this kind of issue.

    Kind regards,

    Andreas

  • One node connects but another doesn't? I note you're not limiting the firewall port to specific IP addresses/address ranges - better practice is to implement that, but it does eliminate one possibility.

    So... are they on the same network? If not, there may be an issue between network connections: Something for your network bod to look at.

    Meanwhile:

    From the node that isn't connecting, try pinging the end point and see what is returned. Compare this with the ping results from the node that is connecting: If they're not the same, that'd be your problem. Check it's using the right DNS lookup (may need to flush DNS) and check the DNS servers are in sync (assuming there are multiple DNS servers): It could be something as simple as the server is picking up the wrong IP address, hasn't got the DNS entry, or you've got two devices with the same IP address on the network (yes, this is possible, particularly if they're both hard set!).

    If the ping results are the same, at least you've eliminated DNS/network issues. Now you're onto server config 🙂

    Good luck (have less experience hunting problems there than with DNS/network issues when it comes to SQL Servers...)

     

  • Hi,

    I think, network is not the problem.

    If we start the sql server on Server 1, and we create the computer logon for Server 1 on Server 2, and if we grant the connect right to the computer logon on the Server 2 for Server 1 to the hadr_endpoint, everything works fine.

    I can't explain why this is working, but after a few seconds, the aoag is synchron again.

    If we change the user, which is starting the sql server service, to the domain user again, aoag stops working.

    Both SQL Server are in the same network, in the same domain, and should use the same domain user, to start the sql server services.

    We startet a call at microsoft support. If they find a solution for this problem, I will post it here.

    Kind regards,

    Andreas

  • Just to clarify: You use a local admin user account to run SQL under and things work. You use a Domain account to run SQL under and it doesn't.

    If I've understood the above right (and this is probably a silly question), has the domain account got local admin permissions on that server?

    I know: Picking at details, but if it's not the server, but the account SQL services run under... it really sounds like a permissions issue... also, has that domain account got read/write permissions to shared areas, particularly the Quorum disk? Could be a red herring, but if there's a missing permission in the mix, you might have your cause.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply