SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


DBmirroring unexpectedly failover


DBmirroring unexpectedly failover

Author
Message
muth_51
muth_51
SSC Eights!
SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)

Group: General Forum Members
Points: 941 Visits: 2905
Hi All,

I had an unexpected Auto failover from Principal to Mirror server.
We saw a network spike from 32MB to 1117MB during that period in the reports but the spike was normal during business working hours.

The mirror is configured in HIgh safety with Automatic failover with witness server mode(synch)
One task was happening during that time was copy of 1.8GB compressed backup copy to the principal server.
Does the network spike happens because of this? As we do this all the time, i dont expect this as the issue.


Could not found any specific errors in the log-
The errors we found were as below:
I would like to know why exactly the failover happened. Please someone can help me in analysing the rootcause of this failover.

Error 1:
The command failed because the database mirror is busy. Reissue the command later.

Error 2:
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\templogfile\templog.ldf] in database [tempdb] (2). The OS file handle is 0x0000000000000514. The offset of the latest long I/O is: 0x000000000b5200

Error 3:
The mirroring connection to "TCP://XXXXXXX:5022" has timed out for database "dbname" after 10 seconds without a response. Check the service and network connections.
Lynn Pettis
Lynn Pettis
SSC-Dedicated
SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)

Group: General Forum Members
Points: 39142 Visits: 38519
muthyala_51 (4/19/2013)
Hi All,

I had an unexpected Auto failover from Principal to Mirror server.
We saw a network spike from 32MB to 1117MB during that period in the reports but the spike was normal during business working hours.

The mirror is configured in HIgh safety with Automatic failover with witness server mode(synch)

One task was happening during that time was copy of 1.8GB compressed backup copy to the principal server.
Does the network spike happens because of this? As we do this all the time, i dont expect this as the issue.


Could not found any specific errors in the log-
The errors we found were as below:
I would like to know why exactly the failover happened. Please someone can help me in analysing the rootcause of this failover.

Error 1:
The command failed because the database mirror is busy. Reissue the command later.

Error 2:
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\templogfile\templog.ldf] in database [tempdb] (2). The OS file handle is 0x0000000000000514. The offset of the latest long I/O is: 0x000000000b5200

Error 3:
The mirroring connection to "TCP://XXXXXXX:5022" has timed out for database "dbname" after 10 seconds without a response. Check the service and network connections.



The spike may have caused a delay in communication between the principal and witness servers. You may want to increase the timeout for failover from 10 seconds to 30 seconds. We had to do this at a previous employer where I had setup database mirroring as we had issues with our network. It was not the stablest of networks and we had periodic glitches during high volume times.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
muth_51
muth_51
SSC Eights!
SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)

Group: General Forum Members
Points: 941 Visits: 2905
But increasing the response time might not give us the actual root cause why it happened.
I am looking more into I/O error what we received- looks to be DISK I/O issue. I have ran the Perfmon counter and saw that the Avg DiskSec/Transfer is >0.015 seconds during File copy

. Can you direct me on this? Thanks.
muth_51
muth_51
SSC Eights!
SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)

Group: General Forum Members
Points: 941 Visits: 2905
One more thing to add, the servers are Virtual (Principal, Mirror and witness).
Lynn Pettis
Lynn Pettis
SSC-Dedicated
SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)

Group: General Forum Members
Points: 39142 Visits: 38519
muthyala_51 (4/19/2013)
But increasing the response time might not give us the actual root cause why it happened.
I am looking more into I/O error what we received- looks to be DISK I/O issue. I have ran the Perfmon counter and saw that the Avg DiskSec/Transfer is >0.015 seconds during File copy

Also noticed during the File copy of file size around 4GB to the one of the disk drives- the SQL server got hang and everything was frozen for couple of minutes and the status of Database on Mirror server were in (Disconnected/In recovery mode), they came to normal state after few minutes. Can you direct me on this? Thanks.


Root cause? Your principal and witness servers were unable to communicate during the timeout period, resulted in the witness making a determination that the prinicapl server was down and initiated a failover to the mirror.

Why? Not enough network bandwidth to communicate due to large data transfer(s) occuring.

Once again, I had this issue at a previous employer, the resolution was to increase the timeout period before a failover occured. This solved the issue of our somewhat instable network causing a failover when there really wasn't a problem. Our automatic failover worked fine when there were real problems with our servers.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Neeraj Dwivedi
Neeraj Dwivedi
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1099 Visits: 1334
Lynn Pettis is right. But one variable here is the virtualization of SQL Server. If you have vMotion enabled and due to memory/ CPU ballooning if the Principal or Mirror is moved, this can happen.

I have seen this in our environment and now we have Disable DRS for SQL VMs for that reason only.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search