https://cdn.meme.am/cache/instances/folder633/58097633.jpg |
But here is a story of a client that may have too many backups after all.
--
Our monitoring system found that the backups on a client server had suddenly started running very long. When I signed on to look I saw that the nightly backups were still running even the next morning! These were the three currently running processes related to backups:
LINE | dd hh:mm:ss.mss | session_id | sql_text | login_name |
1 | 00 15:24:36.674 | 134 | backup database [BobData] to virtual_device = N'CA_BAB_MSSQL_a786700_10002_4c59b51f_BobData' with differential, blocksize = 65536, buffercount = 1, maxtransfersize = 2097152 | DOMAIN1\sa_arcbackup |
2 | 00 11:27:32.043 | 165 | BACKUP DATABASE [BobData] TO DISK = N'B:\SQL_BAC\BobData\BobData_backup_2017_02_23_020011_9582786.bak' WITH RETAINDAYS = 10, NOFORMAT, NOINIT, NAME = N'BobData_backup_2017_02_23_020011_9582786', SKIP, REWIND, NOUNLOAD, COMPRESSION, STATS = 10 | DOMAIN1\sqlcluster |
3 | 00 02:58:20.540 | 84 | backup database [FredData] to virtual_device = N'CA_BAB_MSSQL_a786080_10001_4f06c827_FredData' with blocksize = 65536, buffercount = 1, maxtransfersize = 2097152 | DOMAIN1\sa_arcbackup |
LINE | wait_info | CPU | tempdb_allocations | tempdb_current | blocking_session_id |
1 | (55475564ms)ASYNC_IO_COMPLETION | 236 | 0 | 0 | NULL |
2 | (41252214ms)LCK_M_U | 0 | 0 | 0 | 134 |
3 | (10699785ms)ASYNC_IO_COMPLETION | 357 | 0 | 0 | NULL |
LINE | percent_complete | host_name | database_name | program_name | start_time | login_time |
1 | 79.8087 | Server5 | BobData | Arcserve Backup | 2/22/2017 22:02 | 2/22/2017 22:02 |
2 | NULL | Server1 | BobData | Microsoft SQL Server Management Studio | 2/23/2017 2:00 | 2/23/2017 2:00 |
3 | 18.4672 | Server5 | FredData | Arcserve Backup | 2/23/2017 10:30 | 2/23/2017 1:05 |
database_name | physical_device_name | type | backup_start_date | backup_finish_date |
BobData | B:\SQL_BAC\BobData\BobData_backup_2017_02_22_020015_7258925.bak | D | 02/22/2017 04:29:47 | 02/22/2017 04:40:41 |
BobData | CA_BAB_MSSQL_8ea33d0_10002_4732d2d8_BobData | I | 02/21/2017 22:01:34 | 02/22/2017 04:29:44 |
BobData | B:\SQL_BAC\BobData\BobData_backup_2017_02_21_020010_3116691.bak | D | 02/21/2017 11:54:41 | 02/21/2017 12:05:22 |
BobData | CA_BAB_MSSQL_8c6ff40_10002_420ce264_BobData | I | 02/20/2017 22:02:02 | 02/21/2017 11:54:39 |
BobData | B:\SQL_BAC\BobData\BobData_backup_2017_02_18_020007_3945711.bak | D | 02/19/2017 23:39:43 | 02/19/2017 23:50:38 |
BobData | CA_BAB_MSSQL_8d4b360_10001_329872eb_BobData | D | 02/17/2017 22:00:34 | 02/19/2017 23:39:41 |
The Arcserve backup was kicking off around 10pm local server time, and taking a widely variable amount of time to complete – the first backup (start date 02/17 22:00) was a FULL backup (type D) and took just over *2 days* to complete (02/17 22:00 to 02/19 23:39)!
https://s-media-cache-ak0.pinimg.com/736x/25/b2/82/25b2825a9b57026e1932ba48909699e9.jpg |
The catch is the next row up – the regular FULL backup to the B: drive looks like it kicks off right when the Arcserve backup completes, at 23:39 – this is most likely because it was being blocked by the Arcserve backup (just like the current activity above) and when the Arcserve backup completes, the FULL backup to the B: drive begins.
https://imgflip.com/i/1k9t97 |
Note that this is not related to Arcserve in particular - any third-party backup tool that uses a database agent to directly run backups will display this behavior.
A conversation showed that the Arcserve backups had been recently added and it was definitely a match to the problems.
I understand a desire to have backup file cleanup of 4 days, as seen in the next item in the maintenance plan, as the actual backups can take up a lot of space:
http://i0.kym-cdn.com/entries/icons/original/000/019/630/ihnmotp.jpg |
Also watch your history retentions closely - don't swing the other way - I can't count the number of servers I have found with huge msdb's because no cleanup is happening - but set it to a realistic number that won't hamper your troubleshooting. Consider the frequency of your key processes; in almost all environments there are at least monthly considerations, but often there are also bi-monthly or quarterly concerns that would require you to keep up to 100 days of history for effective troubleshooting.
--
Hope this helps!