Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

DBmirroring unexpectedly failover Expand / Collapse
Author
Message
Posted Friday, April 19, 2013 7:30 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 12:42 PM
Points: 381, Visits: 1,854
Hi All,

I had an unexpected Auto failover from Principal to Mirror server.
We saw a network spike from 32MB to 1117MB during that period in the reports but the spike was normal during business working hours.

The mirror is configured in HIgh safety with Automatic failover with witness server mode(synch)
One task was happening during that time was copy of 1.8GB compressed backup copy to the principal server.
Does the network spike happens because of this? As we do this all the time, i dont expect this as the issue.


Could not found any specific errors in the log-
The errors we found were as below:
I would like to know why exactly the failover happened. Please someone can help me in analysing the rootcause of this failover.

Error 1:
The command failed because the database mirror is busy. Reissue the command later.

Error 2:
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\templogfile\templog.ldf] in database [tempdb] (2). The OS file handle is 0x0000000000000514. The offset of the latest long I/O is: 0x000000000b5200

Error 3:
The mirroring connection to "TCP://XXXXXXX:5022" has timed out for database "dbname" after 10 seconds without a response. Check the service and network connections.
Post #1444389
Posted Friday, April 19, 2013 8:16 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 7:35 PM
Points: 22,992, Visits: 31,471
muthyala_51 (4/19/2013)
Hi All,

I had an unexpected Auto failover from Principal to Mirror server.
We saw a network spike from 32MB to 1117MB during that period in the reports but the spike was normal during business working hours.

The mirror is configured in HIgh safety with Automatic failover with witness server mode(synch)

One task was happening during that time was copy of 1.8GB compressed backup copy to the principal server.
Does the network spike happens because of this? As we do this all the time, i dont expect this as the issue.


Could not found any specific errors in the log-
The errors we found were as below:
I would like to know why exactly the failover happened. Please someone can help me in analysing the rootcause of this failover.

Error 1:
The command failed because the database mirror is busy. Reissue the command later.

Error 2:
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\templogfile\templog.ldf] in database [tempdb] (2). The OS file handle is 0x0000000000000514. The offset of the latest long I/O is: 0x000000000b5200

Error 3:
The mirroring connection to "TCP://XXXXXXX:5022" has timed out for database "dbname" after 10 seconds without a response. Check the service and network connections.



The spike may have caused a delay in communication between the principal and witness servers. You may want to increase the timeout for failover from 10 seconds to 30 seconds. We had to do this at a previous employer where I had setup database mirroring as we had issues with our network. It was not the stablest of networks and we had periodic glitches during high volume times.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1444433
Posted Friday, April 19, 2013 11:19 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 12:42 PM
Points: 381, Visits: 1,854
But increasing the response time might not give us the actual root cause why it happened.
I am looking more into I/O error what we received- looks to be DISK I/O issue. I have ran the Perfmon counter and saw that the Avg DiskSec/Transfer is >0.015 seconds during File copy

. Can you direct me on this? Thanks.
Post #1444535
Posted Friday, April 19, 2013 11:21 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 12:42 PM
Points: 381, Visits: 1,854
One more thing to add, the servers are Virtual (Principal, Mirror and witness).
Post #1444539
Posted Friday, April 19, 2013 11:35 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 7:35 PM
Points: 22,992, Visits: 31,471
muthyala_51 (4/19/2013)
But increasing the response time might not give us the actual root cause why it happened.
I am looking more into I/O error what we received- looks to be DISK I/O issue. I have ran the Perfmon counter and saw that the Avg DiskSec/Transfer is >0.015 seconds during File copy

Also noticed during the File copy of file size around 4GB to the one of the disk drives- the SQL server got hang and everything was frozen for couple of minutes and the status of Database on Mirror server were in (Disconnected/In recovery mode), they came to normal state after few minutes. Can you direct me on this? Thanks.


Root cause? Your principal and witness servers were unable to communicate during the timeout period, resulted in the witness making a determination that the prinicapl server was down and initiated a failover to the mirror.

Why? Not enough network bandwidth to communicate due to large data transfer(s) occuring.

Once again, I had this issue at a previous employer, the resolution was to increase the timeout period before a failover occured. This solved the issue of our somewhat instable network causing a failover when there really wasn't a problem. Our automatic failover worked fine when there were real problems with our servers.




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1444547
Posted Friday, April 19, 2013 1:51 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 7:32 AM
Points: 809, Visits: 1,157
Lynn Pettis is right. But one variable here is the virtualization of SQL Server. If you have vMotion enabled and due to memory/ CPU ballooning if the Principal or Mirror is moved, this can happen.

I have seen this in our environment and now we have Disable DRS for SQL VMs for that reason only.
Post #1444595
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse