*shrug* my post is late due to technical issues.
The second Tuesday of the month is upon us once again. That means it is time for another group blog party called TSQLTuesday. This party that was started by Adam Machanic has now been going for long enough that changes have happened (such as Steve Jones (b | t) managing it now). For a nice long read, you can find a nice roundup of all TSQLTuesdays over here.
This month, we are looking at the 10 yr anniversary of this long standing blog party / tradition. For the 10 yr party, everybody has the opportunity
to bag elaborate on something questionable they have seen being done as it relates to SQL Server.
This is a topic that I know has sparked a lot of commentary by the host Wayne Sheffield (b | t). In Wayne’s invite, he says “Not too long ago, I ran across a situation where I was scratching my head, wondering why something had been implemented the way it had been (you can read about it here). And that gave me the idea for this T-SQL Tuesday topic.”
Take a moment to read about his “aha this is odd” moment and then you can totally see the tone that was intended for this months topic.
My topic (be it ever so late) is about some rather large frustrations with phantom backup processes that were interfering with the normal backup operations, production performance, and were running at random intervals throughout the day sometimes as much as 12 times during normal business hours.
Every single time these backups kicked off, we would get I/O frozen alerts and failed logins until the I/O was unfrozen. Sometimes it would last as little as seven seconds and sometimes lasting for an uncomfortable minute or so.
In working with the sysadmins they kept insisting that it must have been something configured within SQL Server. Each time they insisted, we would have to rehash the scheduled backups in SQL Server as well as validate any windows scheduled tasks. Here is what some of those backups looked like from within SQL Server.
SELECT a.database_name,CONVERT(DATE,a.backup_start_date) AS backup_date, COUNT(a.database_name) AS BackupCount ,a.user_name FROM msdb.dbo.backupset a INNER JOIN msdb.dbo.backupmediafamily b ON a.media_set_id = b.media_set_id WHERE a.user_name NOT LIKE 'LesRoisgrenouilles%' AND a.backup_start_date < GETDATE() - 7 AND a.database_name = 'DBA' GROUP BY a.database_name, CONVERT(DATE,a.backup_start_date), a.user_name ORDER BY a.database_name,backup_date desc; SELECT a.database_name,a.backup_start_date,a.backup_finish_date ,datediff(second,a.backup_start_date,a.backup_finish_date)/60.0 as DurationMinute ,b.physical_device_name AS BackupPath ,a.position ,a.type ,a.backup_size/1024/1024 AS BackupSizeMB ,a.user_name FROM msdb.dbo.backupset a INNER JOIN msdb.dbo.backupmediafamily b ON a.media_set_id = b.media_set_id WHERE a.user_name NOT LIKE 'LesRoisgrenouilles' AND a.backup_start_date < GETDATE() - 7 AND a.database_name = 'DBA' ORDER BY a.database_name,a.backup_start_date desc;
Each of those queries produces results similar to the following.
In this preceding image we see more details about these full backups that have been occurring. We know that the backups are not coming from the SQL Service account. We also know that these are not SQL backups but rather they are coming from an outside entity that is causing the backups to be sent to a virtual device. We know a virtual device is the target due to the GUID for the BackupPath instead of an actual file path.
If I take this a step further and try to trace it via an Extended Events session, I can further confirm that the backups are from an outside entity and that it appears to be something that is using the windows shadow copy service – or something like that. Let’s look closer at that.
/* databases_backup_restore_throughput */ SELECT DB_ID('AdventureWorks2014'); GO USE master; GO -- Create the Event Session IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='BackupThroughput') DROP EVENT SESSION BackupThroughput ON SERVER; GO CREATE EVENT SESSION BackupThroughput ON SERVER ADD EVENT sqlserver.databases_backup_restore_throughput (ACTION ( sqlserver.database_name , sqlserver.nt_username , sqlserver.session_id , sqlserver.client_hostname , sqlserver.client_app_name , sqlserver.sql_text ) WHERE database_id = 8 ) , ADD EVENT sqlserver.trace_print (WHERE database_id = 8) ADD TARGET package0.event_file (SET filename = N'C:DatabaseXEBackupThroughput.xel'); /* start the session */ ALTER EVENT SESSION BackupThroughput ON SERVER STATE = START; GO
League of Shadows
Here we see the properties for the volume shadow copy service for all volumes (including those involved with SQL Server). Intriguing to see that despite the service being disabled for every volume, there appears to still be something causing there to be storage consumed towards the VSS quota. This supports that something is causing the backups to be performed via VSS somehow but it is not the actual Windows service because it is disabled for each volume.
Let’s investigate further. Time to break out some command line queries.
Get-ScheduledTask | Get-ScheduledTaskInfo | Select TaskName,TaskPath,LastRunTime,LastTaskResult
Running that command, I can compare all scheduled tasks and their run times against the actual backups that were occurring and determine if any might be related to the backups. As it turned out, none of the tasks in Task Scheduler ran at any time remotely close to these backups.
Next, let’s look at the VSS configurations more closely. I can query the providers, shadows and shadowstorage.
vssadmin list providers
Looking at the providers first to determine if there is something there, we might be introduced to something like this output.
Maybe there is an aha for you at this moment. Two services are certainly worth investigating: “CommVault VSS Hardware Provider Service” and “Galaxy VSS Provider Service” (both are related to CommVault). In this particular case, we exhausted those avenues and were able to determine that neither was related to this issue.
Let’s try the shadows.
vssadmin list shadows
A couple of things of note here. The timestamps from these vss shadows are consistent with FULL database backup time stamps in SQL Server. Next, we see that these are designed to serve as an “ApplicationRollback” and Differential backup. So, something is taking an application consistent differential backup every so often for some reason yet unknown. Despite the VSS differential backup, it is not a differential backup inside SQL Server – it is a FULL backup. This configuration can be detrimental to your recovery plan if you are unaware of it (some ramifications here).
So, what is causing this? Well, one more clue actually helped us figure it out. The issues started on August 20th. On that day, a new Antivirus software was rolled out to the servers. This software, unbeknownst to the sysadmins, was taking VSS snaps to try and protect against ransomware – supposedly. I have to say, that is a pretty dumb thing to do to a database server. A legitimate backup / recovery plan would suffice in the case of a hijacked OS and without the added burden of the sanpshots from the AV software nor the added stress of having something screw with the backup/recovery plan.
This is a prime example of using a wrecking ball to hammer in a screw. All in the guise of security and protection. Please stop the insanity.
Making matters more difficult is the fact that the GUIDS provided by the VSS storage and what we see listed as the virtual device in SQL Server do not correlate directly to each other. You have to base your assertions off of timestamps.
If your sysadmins insist on having a tool (whether it be Antivirus or a backup solution) that causes VSS snaps to occur which could foul up the database recovery plan, then I recommend you insist on having access to that tool.
Wrapping it Up
I am not a fan of tools that interfere with the database recovery plans that may be requisite to properly meet RTO and RPO. At best these phantom backups cause undue headache in troubleshooting. At worst, they make it impossible to recover in the event of a database related disaster. Sadly, the database is regarded as the red-headed step child and these tools ignore best practices for the database world. Forcing a database server to fit in the same mold as a file or print server is flat out absurd and these tools make no effort to educate the sysadmins of the piss-poor practices they are forcing down the throat of your database server.
In this article I showed various tools in the command line as well as within SQL Server to troubleshoot these problematic backups. One of the tools was an XEvent Session. If you are in need of a little tune-up for your XE skills, I recommend reading a bit on Extended Events to get up to date. For other basics related articles, feel free to read here.