Listener timeout

  • Hello folks and happy new year! 🙂 .

    I have a lab here at work and i ran into this issue regarding the Listener.

    This is an Always On Availability group lab i have and everything is going fine but there's an issue when connecting through the Listener.

    These are 2 SQL server 2016 EE with SP2 on 2 Windows server 2016 Standard.

    1. If using AD accounts, i.e: mine. i have been getting constant timeouts and its intermittent, when i try to use a local sql server instance account it connects lightning fast, i don't get timeouts when connecting directly to the servers through SSMS, only when connecting to the listeners.
    2. Sometimes the failover takes a while and sometimes it doesn't i think this might be related to No 1 issue.

    Have anyone been through this? i believe its got to be something with the domain, i haven't tested at home since my lab machine have some issues and i couldn't really make this at home yet.

    Thanks for taking your time to read this.

    Regards,

  • When you change between an AD account and a local SQL instance are you using the same Windows login on the same machine? I'm guessing you already checked but I am wondering if something is changing the in DNS server settings on the client machine?

    Are the two servers on the same subnet? Do they have more than one IP address, on different subnets? There are specifics to multisubnets with availability groups, I'd have to check the details if that might be the cause.

    Thanks

  • as_1234 wrote:

    When you change between an AD account and a local SQL instance are you using the same Windows login on the same machine? I'm guessing you already checked but I am wondering if something is changing the in DNS server settings on the client machine?

    When connecting to SQL Server using local account, yes in the machine im still connected with my user.

    as_1234 wrote:

    Are the two servers on the same subnet? Do they have more than one IP address, on different subnets? There are specifics to multisubnets with availability groups, I'd have to check the details if that might be the cause.

    Yes, they are in the same subnet, they have 2 IP Addresses each, the one we use for SQL and the WSFC heartbeat.

     

  • I am just looking at the 1st point in your original post again. I think there should be four possible combinations?:

    1. AD account connecting to the listener
    2. AD account connecting directly to the SQL instance
    3. SQL account connecting to the listener
    4. SQL account connecting directly to the SQL instance

    Can you advise which combinations work reliably and which don't?

    If the connections to the listener fail then can you ping the listener name or IP?

    Are you able to check the Windows Clustering manager to see if it indicates any issues?

  • Sure,

    Most combinations work.

    AD Account to Instance itself works fine

    AD Account to Listener timesout or takes too long to connect, or connects.

    SQL Account connecting to listener works fast and fine

    SQL Account connecting to instance itself works fast and fine

    I can ping the listener through IP and name, most of times response is =1ms, sometimes like 0.1% spikes to 150ms, which i believe its normal.

    If i Nslookup name, it would give me the IP Adress of the listener, whenever i nslookup the ip of the listener it doesn't return the name, i believe that's not issue though.

    I'll be checking the cluster to see if i can find something.

    Thanks!

  • Sorry I don't know the cause of this at the moment. Is there anything in the SQL server logs at the time of the failed connection?

  • It's cool, you are trying to help and thats more than welcome!, also i know it's hard to help when you don't have the issue in your hands.

    Actually, sql server logs doesn't show me anything, SSMS just throws the common timeout message with a little bit more information regarding the process of authentication, i'll add a screenshot.

    Thanks again!

    Attachments:
    You must be logged in to view attached files.
  • Thanks. The screenshot says that it failed during the post login stage so I think the connection attempt started. Is there anything in the Windows Event Log?

    I don't know exactly how SQL server AD authentication works but I would think that, at some point, there must be some communication between the SQL Server and the AD server. Maybe that connection is failing sometimes?

    Is it possible that, because it's a lab environment, the machines haven't got much RAM allocated and one of them is using all the memory sometimes?

     

  • Now i got to see some error of after the whole past week searching and didn't find anything on the logs..

    SSPI handshake failed with error code 0x80090304, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. The Local Security Authority cannot be contacted

    Now, this got me wondering, besides adding the SPN on both of the AG replicas, i should register the SPN for the Listener?

    I have been trying to, but i get the error "could not locate account domain/user"

    Besides registering SPNs for the replicas, is there any other service i should register SPN? AG name,Listener?

  • SPN's is outside of my knowledge unfortunately. The information here might help you https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/prereqs-restrictions-recommendations-always-on-availability?view=sql-server-ver15

    Where did you find the error message?

  • I found it on the SQL Server error log today, earlier dates don't have the message.

    Thanks for the URL.

    And... effectively, you have to register SPN for the Listener, i'll keep testing the next couple of days to see if the issue have been fixed.

  • The SPN needs to match the availability group name. Refer to this article for more info:

    Connect to an Always On availability group listener

    Sue

     

     

  • Thanks Sue!, we'll be taking a look and making some tests.

    oh lord if its the SPNs im gonna hang myselffffff.

  • Updating you guys.

    Today we've registered SPNs to the listeners.

    setspn -A MSSQLSvc/AGListener.domain.com domain\serviceaccount

    setspn -A MSSQLSvc/AGListener.domain.com:1433 domain\serviceaccount   (or whatever port you used for the listener)

    so far.. no timeouts or slow connections, after today or 2 days i'm gonna wrap everything up and say it was SPNs.

    Thanks again as1981 and Sue, any news i'll update you.

     

  • Really appreciate it when people provide updates, which you do so thanks!

    I'm guessing your fine now and that was the issue.

    Sue

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply