SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


extremely slow log shipping restore in standby mode


extremely slow log shipping restore in standby mode

Author
Message
dpaproskijr
dpaproskijr
SSC Veteran
SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)

Group: General Forum Members
Points: 268 Visits: 231
I had a situation recently where restores of .trn files to a database in standby mode was taking between 8-12 hours per file.

I put a DBCC TRACEON(3004,3605,-1) to get extended information. Below is the trace information (I've removed lines that don't look too interesting indicated with "...") for a restore of a 5.5G .trn file.

I've marked the two areas in red where the bulk of the time is being spent...2 hours between log entries first and then 6 hours second. The time spent where it's actually restoring doesn't seem too bad for a 5.5G file...about 44 minutes including the zeroing.

I'm guessing that the 2 and 6 hours is related to the time being spent applying and writing to the .tuf file. In order to speed things up (otherwise the log shipping recipient would never catch up), I switched the database restore state to "No recovery mode". Once the backlog was processed I switched back to "Standby mode" and restore times once more seem reasonable.

Some additional info: this exact same database was also being log shipped to another physical server which did not suffer the same symptoms. The other server was virtually identical hardware except the disk for the databases were non-RAID...the slow server is RAID-5. I suspected file fragmentation and ran a defragger on the slow server which had no effect. System monitoring tools showed no particular stress on the slow machine...IO wasn't particularly intense and CPU usage was minimal. It was like SQL Server was just taking its sweet time. Unfortunately, I don't have trace logs for the exact same files to compare between the two servers during the slow performance. Once my server got caught up, I switched back to standby mode and restore times and trace logs are more or less identical on the two machines.

The trace:

2011-04-26 09:56:01.14 spid56 RestoreLog: Database QAI
...
2011-04-26 09:56:01.65 spid56 Restore: Undoing STANDBY for QAI
2011-04-26 11:58:10.48 spid56 SnipEndOfLog from LSN: (24837:114963:1)

2011-04-26 11:58:10.49 spid56 FixupLogTail() zeroing E:\SQLLOGS\log.ldf from 0xa67812600 to 0xa67814000.
2011-04-26 11:58:10.50 spid56 Zeroing E:\SQLLOGS\log.ldf from page 5454858 to 5455864 (0xa67814000 to 0xa67ff0000)
...
2011-04-26 12:06:11.80 spid56 Zeroing completed on E:\SQLLOGS\log.ldf
2011-04-26 12:06:18.20 spid56 Restore: Finished undoing STANDBY for QAI
2011-04-26 12:06:18.25 spid56 Restore: PreparingContainers
2011-04-26 12:06:18.26 spid56 Restore: Containers are ready
2011-04-26 12:06:18.27 spid56 Restore: Restoring backup set
2011-04-26 12:06:18.27 spid56 Restore: Transferring data to QAI
2011-04-26 12:06:18.27 spid56 Restore: Waiting for log zero on QAI
2011-04-26 12:06:18.28 spid56 Restore: LogZero complete
2011-04-26 12:35:25.55 spid56 FileHandleCache: 0 files opened. CacheSize: 10
2011-04-26 12:35:25.56 spid56 Restore: Data transfer complete on QAI
2011-04-26 12:35:25.57 spid56 Restore: Backup set restored
2011-04-26 12:35:25.58 spid56 Restore-Redo begins on database QAI
2011-04-26 12:41:32.41 spid56 Rollforward complete on database QAI
2011-04-26 12:41:32.45 spid56 Restore: Done with fixups
2011-04-26 12:41:32.48 spid56 Transitioning to STANDBY
2011-04-26 12:41:38.26 spid56 Starting up database 'QAI'.
2011-04-26 12:41:38.58 spid56 The database 'QAI' is marked RESTORING and is in a state that does not allow recovery to be run.
2011-04-26 12:42:33.79 spid56 Zeroing E:\SQLLOGS\log.ldf from page 6144833 to 6144893 (0xbb8682000 to 0xbb86fa000)
2011-04-26 12:42:33.80 spid56 Zeroing completed on E:\SQLLOGS\log.ldf
2011-04-26 19:06:20.74 spid56 Recovery is writing a checkpoint in database 'QAI' (46). This is an informational message only. No user action is required.

2011-04-26 19:12:57.97 spid56 Starting up database 'QAI'.
2011-04-26 19:13:13.09 spid56 Database is in STANDBY

Can anyone shed any light on what was going on? Thanks
adb2303
adb2303
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2900 Visits: 3238
Hi, how many VLF's in your transaction log?

from master db, run

DBCC LOGINFO (DBNAME)
dpaproskijr
dpaproskijr
SSC Veteran
SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)

Group: General Forum Members
Points: 268 Visits: 231
Thanks for the response.

There are 147 VLFs. I checked at the time and 147 didn't seem like too many. It's the same number on both the fast and slow server which makes sense I think.
adb2303
adb2303
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2900 Visits: 3238
was there any additional information written to the log after the bit you posted? I.e. something about 'log was restored.... first LSN: xxxx last LSN' ?

If so, can you work out how many VLF's are being restored from the log backup: last LSN - first LSN?
dpaproskijr
dpaproskijr
SSC Veteran
SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)

Group: General Forum Members
Points: 268 Visits: 231
2011-04-26 19:13:13.18 Backup Log was restored. Database: QAI, creation date(time): 2008/02/12(11:37:56), first LSN: 24837:114963:1, last LSN: 24922:13456:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'E:\SQLBACKUPS\QAI_logshipping\QAI_20110415214601.trn'}). This is an informational message. No user action is required.

I think the calculation would be 24922-24837 = 85
Ray Mond
Ray Mond
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1979 Visits: 1542
6 hours does look like an extremely long time to create the undo file, even if the entire 5.5 GB contains uncommitted transactions. Can you compare the restore speed against the other server? You can check in msdb..restorehistory for the time the restore started, then look in the SQL Server log for the time the restore completed.

Ray Mond
Get a SQL Size 2.0 single-instance license for FREE
Claim one here!
Ian Scarlett
Ian Scarlett
SSCertifiable
SSCertifiable (6.7K reputation)SSCertifiable (6.7K reputation)SSCertifiable (6.7K reputation)SSCertifiable (6.7K reputation)SSCertifiable (6.7K reputation)SSCertifiable (6.7K reputation)SSCertifiable (6.7K reputation)SSCertifiable (6.7K reputation)

Group: General Forum Members
Points: 6710 Visits: 7201
Ray Mond (5/6/2011)
6 hours does look like an extremely long time to create the undo file, even if the entire 5.5 GB contains uncommitted transactions.

I agree it does seem an extremely long time, but... my experience of restoring with standby has been similar, and I resorted to using standby only for the last log file of a bunch.

If the whole 5.5 GB contains uncommitted transactions, some of which were left over from the previous log file, then SQL Server has to re-apply the partial transactions from the previous undo file, then restore the log and apply the 5.5 GB of transactions, then roll back those same 5.5 GB worth of transactions and write them to a new undo file.

As we've seen from some other threads on this site, rolling back a large transaction takes an order of magnitude longer than making the changes in the first place.



dpaproskijr
dpaproskijr
SSC Veteran
SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)

Group: General Forum Members
Points: 268 Visits: 231
I've got the contents of the restorehistory records for the .trn file listed below...I tried to format it but I couldn't get a nice display to work in using this HTML format. To summarize, everything looks exactly the same between the two servers but the slow server starts at 9:56 and finishes at 19:13, the fast server starts at 18:47 and finishes at 19:48. I joined the restorehistory up with the log file based on the start and end LSN...the way the file name is offset due to the timezone caused a bit of confusion.

From the fast server log file:

2011-04-16 19:48:11.68 Backup Log was restored. Database: QAI, creation date(time): 2008/02/12(11:37:56), first LSN: 24837:114963:1, last LSN: 24922:13456:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'f:\logshipping\qai\QAI_20110415214601.trn'}). This is an informational message. No user action is required.


From msdb..restorehistory join backupset

Slow sever:

restore_history_id 1909401
restore_date 4/26/11 9:56
destination_database_name QAI
backup_set_id 1909780
restore_type L
replace 0
recovery 0
restart 0
stop_at NULL
device_count 1
stop_at_mark_name NULL
stop_before NULL
backup_set_id 1909780
backup_set_uuid 7ECD61F5-27AA-4378-A09E-6801CBC0E6E8
media_set_id 1909778
first_family_number 1
first_media_number 1
last_family_number 1
last_media_number 1
catalog_family_number 1
catalog_media_number 1
position 1
expiration_date NULL
software_vendor_id 4608
name NULL
description NULL
software_major_version 9
software_minor_version 0
software_build_version 4035
time_zone -16
mtf_minor_version 0
first_lsn 24837000011496300000
last_lsn 24922000001345600000
checkpoint_lsn 24903000010213400000
database_backup_lsn 19295000003680200000
database_creation_date 37:56.0
backup_start_date 46:01.0
backup_finish_date 52:08.0
type L
sort_order 52
code_page 0
compatibility_level 90
database_version 611
backup_size 5653515776
database_name QAI
server_name HLDSQL01
machine_name HLDSQL01
flags 512
unicode_locale 1033
unicode_compare_style 196609
collation_name SQL_Latin1_General_CP1_CI_AS
is_password_protected 0
recovery_model FULL
has_bulk_logged_data 0
is_snapshot 0
is_readonly 0
is_single_user 0
has_backup_checksums 0
is_damaged 0
begins_log_chain 0
has_incomplete_metadata 0
is_force_offline 0
is_copy_only 0
first_recovery_fork_guid B12C1E27-CFEF-4A71-98F8-94C8DACC6994
last_recovery_fork_guid B12C1E27-CFEF-4A71-98F8-94C8DACC6994
fork_point_lsn NULL
database_guid E10AB28E-3ADB-4055-AD59-F151742F8E76
family_guid B12C1E27-CFEF-4A71-98F8-94C8DACC6994
differential_base_lsn NULL
differential_base_guid NULL

Fast Server:
restore_history_id 722936
restore_date 4/16/11 18:47
destination_database_name QAI
backup_set_id 722917
restore_type L
replace 0
recovery 0
restart 0
stop_at NULL
device_count 1
stop_at_mark_name NULL
stop_before NULL
backup_set_id 722917
backup_set_uuid 7ECD61F5-27AA-4378-A09E-6801CBC0E6E8
media_set_id 722917
first_family_number 1
first_media_number 1
last_family_number 1
last_media_number 1
catalog_family_number 1
catalog_media_number 1
position 1
expiration_date NULL
software_vendor_id 4608
name NULL
description NULL
software_major_version 9
software_minor_version 0
software_build_version 4035
time_zone -16
mtf_minor_version 0
first_lsn 24837000011496300000
last_lsn 24922000001345600000
checkpoint_lsn 24903000010213400000
database_backup_lsn 19295000003680200000
database_creation_date 37:56.0
backup_start_date 46:01.0
backup_finish_date 52:08.0
type L
sort_order 52
code_page 0
compatibility_level 90
database_version 611
backup_size 5653515776
database_name QAI
server_name HLDSQL01
machine_name HLDSQL01
flags 512
unicode_locale 1033
unicode_compare_style 196609
collation_name SQL_Latin1_General_CP1_CI_AS
is_password_protected 0
recovery_model FULL
has_bulk_logged_data 0
is_snapshot 0
is_readonly 0
is_single_user 0
has_backup_checksums 0
is_damaged 0
begins_log_chain 0
has_incomplete_metadata 0
is_force_offline 0
is_copy_only 0
first_recovery_fork_guid B12C1E27-CFEF-4A71-98F8-94C8DACC6994
last_recovery_fork_guid B12C1E27-CFEF-4A71-98F8-94C8DACC6994
fork_point_lsn NULL
database_guid E10AB28E-3ADB-4055-AD59-F151742F8E76
family_guid B12C1E27-CFEF-4A71-98F8-94C8DACC6994
differential_base_lsn NULL
differential_base_guid NULL
Ray Mond
Ray Mond
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1979 Visits: 1542
That's certainly a big difference of 8 hours between the 2 servers to restore that transaction log. I know it's a long shot, but any chance the SQL Server process is running below normal priority?

Ray Mond
Get a SQL Size 2.0 single-instance license for FREE
Claim one here!
dpaproskijr
dpaproskijr
SSC Veteran
SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)SSC Veteran (268 reputation)

Group: General Forum Members
Points: 268 Visits: 231
It's running at normal priority...I did try setting it to high but it didn't make any difference.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search