Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Mirrored database becomes SUSPENDED during nightly maintenance Expand / Collapse
Author
Message
Posted Monday, April 18, 2011 9:35 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, June 26, 2014 7:19 AM
Points: 71, Visits: 489
Hi there!

I will try to keep this post as short as possible, without leaving out any crucial information.

Server Setup: SQL Server 2008 (x64) - Enterprise Edition, SP2 CU2.

Database Setup: Three principal databases which are mirrored. Each of these mirrored db's then has a database snapshot created on them, every 30 minutes.

We're running Database Mirroring in High Performance (asynchronous) mode, without a witness server. Both SQL Server instances, which we here call Instance1 (where the principal databases resides) and Instance2 (where the mirrored databases and the database snapshot resides) are part of the same two node failover cluster.

On two occasions now we've encountered the Error: 9003, Severity: 20, State: 15 (happens at 2011-04-17 00:37:35.770)
Which is followed by this information:

The log scan number (82666:12760:1) passed to log scan in database 'Principal_Database' is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication.

Otherwise, restore from backup if the problem results in a failure during startup.

Which is then followed by Error: 9001, Severity: 21, State: 8.
2011-04-17 00:37:35.800 spid60 The log for database 'Principal_Database_Stats' is not available. Check the event log for related error messages. Resolve any errors and restart the database.


The last error is related to the database snapshot which is taken on the mirrored database.

When this has happend we can see that there are a lot of transactions going on, and the transaction log backups are quite large (1-4GB) during this period. The reason that the number and the size of transactions increase is because of two SQL Agent jobs that run at the same time.
One of the jobs does an INDEX REBUILD, and the other deletes data from one table which exists in two of the databases.

Look at the table below for the actual numbers gathered during this period for the mirroring session.

DatabaseName	MirrorState	[log_generation_rate (MB/sec)]	[unsent_log (MB)] [send_rate (MB/sec)]	[unrestored_log (MB)]	[transactions_per_sec]	[time_recorded]
Principal_Database SUSPENDED 6.69 9122.06 0.00 0.00 113 2011-04-17 00:49:00.713
Principal_Database SUSPENDED 5.39 8720.67 0.00 0.00 190 2011-04-17 00:48:00.690
Principal_Database SUSPENDED 23.62 8397.29 0.00 0.00 223 2011-04-17 00:47:00.667
Principal_Database SUSPENDED 26.14 6980.28 0.00 0.00 242 2011-04-17 00:46:00.627
Principal_Database SUSPENDED 16.89 5412.13 0.00 0.00 156 2011-04-17 00:45:00.620
Principal_Database SUSPENDED 13.33 4398.53 0.00 0.00 123 2011-04-17 00:44:00.617
Principal_Database SUSPENDED 15.15 3598.76 0.00 0.00 140 2011-04-17 00:43:00.603
Principal_Database SUSPENDED 10.71 2689.81 0.00 0.00 100 2011-04-17 00:42:00.593
Principal_Database SUSPENDED 10.63 2047.13 0.00 0.00 99 2011-04-17 00:41:00.580
Principal_Database SUSPENDED 6.63 1419.98 0.00 0.00 62 2011-04-17 00:40:01.617
Principal_Database SUSPENDED 11.29 1015.65 0.00 0.00 105 2011-04-17 00:39:00.847
Principal_Database SUSPENDED 9.13 338.37 3.51 0.00 85 2011-04-17 00:38:00.810
Principal_Database SYNCHRONIZED 6.51 0.12 6.51 2754.57 59 2011-04-17 00:37:00.770
Principal_Database SYNCHRONIZED 5.71 0.00 5.71 2438.10 94 2011-04-17 00:36:00.770
Principal_Database SYNCHRONIZED 2.55 0.00 2.55 2168.79 584 2011-04-17 00:35:00.707
Principal_Database SYNCHRONIZED 1.91 0.00 1.91 2125.68 436 2011-04-17 00:34:00.697
Principal_Database SYNCHRONIZED 0.98 0.00 0.98 2103.46 622 2011-04-17 00:33:00.650
Principal_Database SYNCHRONIZED 3.02 0.00 3.02 2191.67 327 2011-04-17 00:32:00.650
Principal_Database SYNCHRONIZED 5.56 0.11 5.55 2125.71 111 2011-04-17 00:31:00.637
Principal_Database SYNCHRONIZED 6.92 0.00 7.34 1877.80 122 2011-04-17 00:30:00.623
Principal_Database SYNCHRONIZED 4.42 24.67 4.01 1516.42 41 2011-04-17 00:29:00.623
Principal_Database SYNCHRONIZED 6.00 0.00 6.00 1328.79 55 2011-04-17 00:28:00.547
Principal_Database SYNCHRONIZED 6.66 0.00 6.66 1072.62 62 2011-04-17 00:27:00.520

(It now became clear that the table wasn't really easy to read. Hmm, maybe copy paste into another document? Well, not much to do ... )

Questions:

* Is it so that the number of transactions and amount of log during the minutes after 00:30:00 finally "breaks" the mirroring and turn it from SYNCHRONIZED to SUSPENDED?

* Would you consider 500-600 transactions/sec a high number, quite normal or maybe even very normal?
(I ask since I don't know. Probably depends and is related to the load you have, but this system averages 5-10 transactions/sec during normal production hours)

* Are there other questions to be asked around this?

* Is it to much to mirror three databases and then create snapshots on them, with this kind of load during maintenance?

* Should we change anything in our environment, maybe the maintenance routine?

Thank you very much for taking your time with this.
I hope anyone has any good ideas and/or answers to our problem!

Sincerely,

Gord
Post #1095084
Posted Monday, April 18, 2011 10:12 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 4:16 PM
Points: 4,389, Visits: 9,526
I can't really answer all of your questions, but just to clarify about your mirroring. The mirroring state will not go to suspended unless someone is issuing a command to suspend mirroring. My bet would be the index rebuild process is suspending the mirroring and then resuming it after it is done.

As for having 3 databases mirrorring - shouldn't be an issue at all. I mirror 14 databases from CA to TX with no real issues. After an index rebuild or a purge operation, the transactions for the mirror queue and can take several hours (sometimes days, depending on the changes) to catch up.

My guess here is that your snapshots are causing the majority of the problems. Do you keep multiple snapshots? If so, how long are you keeping them?


Jeffrey Williams
Problems are opportunites brilliantly disguised as insurmountable obstacles.

How to post questions to get better answers faster
Managing Transaction Logs
Post #1095120
Posted Monday, April 18, 2011 2:17 PM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Saturday, August 16, 2014 5:48 AM
Points: 1,862, Visits: 3,602
You may need to upgrade to Cumulative Update 3 for SQL Server 2008 Service Pack 2:

http://support.microsoft.com/kb/2403218#appliesto


__________________________________________________________________________________

Turbocharge Your Database Maintenance With Service Broker: Part 2
Turbocharge Your Database Maintenance With Service Broker: Part 1
Real-Time Tracking of Tempdb Utilization Through Reporting Services
Monitoring Database Blocking Through SCOM 2007 Custom Rules and Alerts
Preparing for the Unthinkable - a Disaster/Recovery Implementation
Post #1095294
Posted Tuesday, April 19, 2011 3:25 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, June 26, 2014 7:19 AM
Points: 71, Visits: 489
Thank you very much for your answers!

Jeffrey: That sounds logical, I agree. But what about the part "The log scan number (82666:12760:1) passed to log scan in database.." in the error message?

It's great to hear that three mirrored databases shouldn't be an issue. As you say it might be the index rebuild that is suspending the mirror and then takes time to enable again. When it comes to the database snapshots, they are created ever 30 minutes in a SQL Agent job. The job first drops the existing snapshot (for each database) and then creates a new one. So there are no duplicates. Here's how it looked in the error log a few minutes before the database got suspended:

..
Recovery of database 'Principal_Database_Stats' (9) is 0% complete (approximately 208254 seconds remain). Phase 1 of 3.
Recovery of database 'Principal_Database_Stats' (9) is 0% complete (approximately 208239 seconds remain). Phase 1 of 3.
Recovery of database 'Principal_Database_Stats' (9) is 0% complete (approximately 208682 seconds remain). Phase 2 of 3.
Recovery of database 'Principal_Database_Stats' (9) is 2% complete (approximately 6120 seconds remain). Phase 2 of 3.
Recovery of database 'Principal_Database_Stats' (9) is 14% complete (approximately 869 seconds remain). Phase 2 of 3.
Recovery of database 'Principal_Database_Stats' (9) is 15% complete (approximately 802 seconds remain). Phase 2 of 3.
..


It then came to 35% with approx 800 seconds left, just before error 9003 occured.

At daytime and normal production load, the drop and create of the three snapshots take between 5 and 10 seconds. At the time of the index rebuild and the other "delete job" running at the same time, the snapshot job takes about 15 - 25 minutes.

Marios: I'll definitely look into upgrading to the latest CU. It had some interesting info, and the title of the fix is spot on to our problem!

Thanks once again for your answers, I appreciate it!

Sincerely,

Gord
Post #1095479
Posted Monday, March 11, 2013 9:21 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 10:14 PM
Points: 40, Visits: 303
Hi,

This is just a guess based on what you've said, but how long is your optimization taking to complete? As you are using enterprise edition I'm assuming the optimization is online. It's possible that your snapshots are conflicting with the snapshot that is taken to rebuild the index. A snapshot every 30 minutes seems a little extreme anyway(are you reporting off these?), but I'd definitely recommend suspending that activity while your maintenance is being performed if possible.
Post #1429341
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse