Mirroring - Intermittent "network name is no longer available"

  • Is this a network issue?

    I have a mirroring setup - synchronous with automatic failover. The databases went into Disconnected mode for about a minute last night, and everything was synchronized soon after. Looking at the error logs, the principal server could not contact the secondary server. The secondary server could not contact the witness. But I don't see a message that the principal lost contact with the witness.

    There weren't any index defrag jobs running, or heavy activity at the time. This is the first time the issue has happened around that time of day.

    Principal error log

    12:04:06 AM Database mirroring connection error 4 '64(The specified network name is no longer available.)' for 'TCP://prod02.dg.com:5023'.

    12:04:27 AM Database mirroring connection error 2 'Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.' for 'TCP://prod02.dg.com:5023'.

    Secondary error log

    12:03:46 AM Database mirroring connection error 4 'An error occurred while receiving data: '64(The specified network name is no longer available.)'.' for 'TCP://PROD-WITNESS.dg.com:5024'.

    12:03:46 AM Database mirroring connection error 4 '10054(An existing connection was forcibly closed by the remote host.)' for 'TCP://prod01.dg.com:5022'.

    I am responsible for doing a root cause analysis, and am in a DBA role. Does this look like a network issue? If so, is there anything I can suggest that my IT department look into to pinpoint the problem? We've had similar issues intermittently, and they haven't found anything. We use VMWare.

    How can I tell if there was a loss of quorum in a mirroring session? There were application timeout errors, and trying to determine whether these were due to the mirroring issues, or just the network. Any thoughts you have would be appreciated.

    Thanks!

    Dan

  • We just experienced a long period of network performance issues that resulted in a lot of those same errors in SQL backups. The root cause was a bad drive in the SAN.

  • Thanks for your response. I will keep that in mind.

  • Hi, I did a recent in-place upgrade from SQL Server 2008 R2 to SQL Server 2012 SP1 CU8 and dropped and re-added mirroring and now we are having the same issue with the intermittent connectivity.

    Did you ever resolve your trouble, and if so, how?

  • There were errors in the mirror server Windows log, and I assume that it is a problem with the disk subsystem.

    Event ID: 129

    Reset to device, \Device\RaidPort3, was issued.

    Event ID: 129

    Reset to device, \Device\RaidPort2, was issued.

    Unfortunately, I don't think we resolved it, as issues happen occasionally. But it's beyond my expertise or permission as a DBA. Thankfully, it doesn't happen all the time.

  • I have had situations in the past where SQL Serve reported a network problem but the Network people said nothing had gone wrong.

    Eventually we wrote a simple script that did a PING every second and saved the result. This showed intermittent network problems, and the Network people had to accept that something had gone wrong. Eventually they found and fixed the problem, but as with anything that is intermittent it took a while to sort out.

    If you can prove that the problem happens outside of SQL Server, then you have a better chance of the subject matter experts accepting a problem exists and getting it fixed.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • PianoDan (9/9/2013)


    But I don't see a message that the principal lost contact with the witness.

    If it had the Principal database would have gone offline, did this happen at all?

    You may need to adjust your mirror timeout for this mirror session, the default is 10, try raising it a little to cope with network outages.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Your messages actually look a lot like the messages we receive from our SAN during a backup, that may be something to check if there are SAN backups occurring they will freeze the I/O and do a reset to Device. We don't have that anymore after changing to Avamar backups. Those messages could also indicate SAN disk pressure, I have seen that in the past as well, but I would definitely see if it correlates to some kind of SAN backup.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply