Windows SQL Intermittent Connectivity Issue - Error 17830

  • Connecting to a particular SQL 2012 sever using Windows Authentication from a different server (client SSMS) in the same domain, intermittently the connection fails with:

    Cannot connect to XXXSQL03.
    ------------------------------
    ADDITIONAL INFORMATION:
    Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was unable to respond back in time. The duration spent while attempting to connect to this server was - [Pre-Login] initialization=21153; handshake=0; (Microsoft SQL Server, Error: -2)

    No error messages in SQL Error Log or Event Viewer.
    However, I can see the attempts in Extended Events System Health monitor. The error number is 17830 (states 11 & 105) with messages as follows:

    message    Network error code 0x40 occurred while establishing a connection; the connection has been closed. This may have been caused by client or server login timeout expiration. Time spent during login: total 45067 ms, enqueued 0 ms, network writes 0 ms, network reads 45066 ms, establishing SSL 0 ms, network reads during SSL 0 ms, network writes during SSL 0 ms, secure calls during SSL 0 ms, enqueued during SSL 0 ms, negotiating SSPI 0 ms, network reads during SSPI 0 ms, network writes during SSPI 0 ms, secure calls during SSPI 0 ms, enqueued during SSPI 0 ms, validating login 0 ms, including user-defined login processing 0 ms. [CLIENT: 10.1.3.74]

    message    Network error code 0x2746 occurred while establishing a connection; the connection has been closed. This may have been caused by client or server login timeout expiration. Time spent during login: total 136 ms, enqueued 0 ms, network writes 0 ms, network reads 136 ms, establishing SSL 0 ms, network reads during SSL 0 ms, network writes during SSL 0 ms, secure calls during SSL 0 ms, enqueued during SSL 0 ms, negotiating SSPI 0 ms, network reads during SSPI 0 ms, network writes during SSPI 0 ms, secure calls during SSPI 0 ms, enqueued during SSPI 0 ms, validating login 0 ms, including user-defined login processing 0 ms. [CLIENT: 10.1.3.74]

    I have seen many similar posts/articles but no resolutions (fingers generally pointing at network/DNS).

    Both network and DNS "appear" OK (but I'm no expert). In any case I need to prove it's one or the other to external teams.

    The Windows connections do use KERBEROS. Having said that, I also have the same problem with SQL connections.

    Any suggestions welcome.

  • Intermittent errors are not fun to troubleshoot, especially when they are network related errors. A couple of thoughts -
    When you were searching on this, you may have found that error 0x40 is error 64 which means : "The specified network name is no longer available"
    That's generally not something under SQL Server's control. It's a network error as indicated in the extended events.
    What you could do to try to elicit some help from the network side is to show that that putting hex 40 in a calculator and translating that to decimal gives you 64 and if you do net helpmsg 64 it will show the error message of The specified network name is no longer available. This article has a list of troubleshooting steps for NICs and also lists that error:
    Advanced network adapter troubleshooting for Windows workstations

    The error can pop up when you have some larger network i/o operations as well so you may want to look at SQL Server or Windows jobs/tasks that could running when you have the errors which could be contributing that that.
    If you could find any pattern or something like that with particular clients that have the error, that could help as well. If one in particular experiences it more often you could try an entry in the host file in case it is DNS related.
    If you are aware of any of the errors when they happen, or shortly after you could check the windows event logs for the client and the SQL Server and see if you can find anything related.

    Sue

  • Sue_H - Thursday, June 8, 2017 12:46 PM

    Intermittent errors are not fun to troubleshoot, especially when they are network related errors. A couple of thoughts -
    When you were searching on this, you may have found that error 0x40 is error 64 which means : "The specified network name is no longer available"
    That's generally not something under SQL Server's control. It's a network error as indicated in the extended events.
    What you could do to try to elicit some help from the network side is to show that that putting hex 40 in a calculator and translating that to decimal gives you 64 and if you do net helpmsg 64 it will show the error message of The specified network name is no longer available. This article has a list of troubleshooting steps for NICs and also lists that error:
    Advanced network adapter troubleshooting for Windows workstations

    The error can pop up when you have some larger network i/o operations as well so you may want to look at SQL Server or Windows jobs/tasks that could running when you have the errors which could be contributing that that.
    If you could find any pattern or something like that with particular clients that have the error, that could help as well. If one in particular experiences it more often you could try an entry in the host file in case it is DNS related.
    If you are aware of any of the errors when they happen, or shortly after you could check the windows event logs for the client and the SQL Server and see if you can find anything related.

    Sue

    Thanks Sue ... your comments very much appreciated ... tried Host file entry but still had the problem.

    Out of 300 odd SQL Servers, I appear to have 3 servers (2 VMs, 1 Physical) with this intermittent problem (SQL Server 2008/2012 on WS 2008 R2/WS 2012 R2).

    In your opinion is this "definitely" a network/dns issue or could it in some way be SQL?

    I plan to engage with the Windows and Network teams next week, so the more information I have, the better!

  • Claudio Gatto - Thursday, June 8, 2017 11:32 PM

    Thanks Sue ... your comments very much appreciated ... tried Host file entry but still had the problem.

    Out of 300 odd SQL Servers, I appear to have 3 servers (2 VMs, 1 Physical) with this intermittent problem (SQL Server 2008/2012 on WS 2008 R2/WS 2012 R2).

    In your opinion is this "definitely" a network/dns issue or could it in some way be SQL?

    I plan to engage with the Windows and Network teams next week, so the more information I have, the better!

    Not necessarily definitely a network issue as much as a network error. Another thing you may want to try to get more information on the errors is to query sys.dm_os_ring_buffers. You'll get some of the same information from the extended events but also some additional information. Check the post here and try the first two queries (especially the second one) which could give you some additional information in tracking things down: 
    Inside sys.dm_os_ring_buffers

    Sue

  • I have come across a pattern for my connectivity issue. Ping and Tracert from my SSMS client to the remote server as follows:

    H:\>ping XXXsql03 -t

    Pinging XXXsql03.ssssss.com [X0.X20.X3.X6] with 32 bytes of data:
    Reply from X0.X20.X3.X6: bytes=32 time=62ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=65ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=58ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=50ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=57ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=53ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=59ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=72ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=49ms TTL=124
    Reply from X0.X20.X3.X6: bytes=32 time=51ms TTL=124

    Ping statistics for X0.X20.X3.X6:
        Packets: Sent = 21, Received = 21, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 48ms, Maximum = 72ms, Average = 55ms
    Control-C
    ^C
    H:\>tracert XXXsql03

    Tracing route to XXXsql03.ssssss.com [X0.X20.X3.X6]
    over a maximum of 30 hops:

      1     7 ms    <1 ms    <1 ms  X0.X.X00.X18
      2     1 ms    <1 ms    <1 ms  X0.X00.X1.X41
      3     *       38 ms    37 ms  X0.X7.X.X20
      4    48 ms    58 ms    58 ms  X0.X7.X.X10
      5    51 ms    60 ms    50 ms  X0.X25.X5.X45
      6    63 ms    59 ms    48 ms  XXXsql03.ssssss.com [X0.X20.X3.X6]

    Trace complete.

    In summary, Ping time averages 55ms (generally <1ms for other servers) and Tracert shows up to 6 hops before reaching destination.

    Although the server is in the same domain and LAN, it is situated a couple of thousand kilometres away.

    The ping time is not great but I wouldn't of thought that it was bad enough to cause timeout issues in the SSMS client ... and increasing the timeout only helps a bit.

    Any thoughts?

  • That's a good way to compare some of the response times. I'd probably try to ping, tracert by name, by IP and also by FQDN as differences between those can have lead you to what the issues could be. And the difference can sometimes help in getting someone from the network group to help out.

    Sue

  • As a thought, do either of these VMWare KB's help:
    https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009517
    https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2008925

    This would only apply to the VM's, not the physical machines.
    The power profile settings apply to both physical and VM.  I would recommend setting it to high performance if it isn't already.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply