Windows Server 2012 Deduplication – Should you use it with SQL Server Backups?

Question

Windows Server 2012 Deduplication – Should you use it with SQL Server Backups?

Joe Sack SQLskills

Old Hand

Points: 314
More actions
February 7, 2013 at 11:30 pm

#268293

Comments posted to this topic are about the item Windows Server 2012 Deduplication – Should you use it with SQL Server Backups?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Phil Factor SSC-Insane Points: 20244 More actions · Answer 1

[p]Yes, a very interesting article which I enjoyed, and which has been quite an eye-opener for me.[/p]

[p]You found that a deduplicated drive was still able to compress a compressed backup. This most likely suggests to me that Microsoft's compression algorithm was surprisingly weak (roughly 33% in your case) or that maybe the powerShell CmdLet was being optimistic, since you'd surely normally expect zero or negative effect from compressing something that's been compressed to the max, no matter the algorithm. The total saving was close to what I'd expect from a decent compressed backup from a third-party tool such as SQL Backup.[/p]

[p]There has been a debate about the use of deduplication in Backup-As-A-Service (BaaS), The one core problem with deduplication [/url] 'Deduplication is about backup.It’s not about recovery.' though this doesn't apply to using a local SQL Server 2012- solution. Unless you're doing something very clever with deduplication, it doesn't help to minimise network traffic. (Source vs Target Rehydration). As I understand it, the act of reading the file from the deduplicated drive rehydrates it, so your archived backups will have to be copied across the network in their uncompressed state. Try doing that to and from cloud backup! Deduplication is not supported or recommended by Microsoft for live data, but is OK for local backups. However, I don't see an advantage over properly compressed backups and, like you, I can see the theoretical weakness of a single point of failure in the duplication algorithm though Scott M. Johnson and Microsoft Research seem to have minimized the risks.[/p]

[p]I'm going to rush off and try a dedup drive, but your article leaves me thinking that this technology is great for document stores, logs, and other reasonably static data, but of less immediate interest to the DBA who already has compressed backups![/p]

Best wishes,
Phil Factor

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 2

I'm surprised you were getting dedup through the encrypted databases. Really surprised. I figured that would work more like trying to compress them (where the size sometimes goes UP). Any guesses as to why that might be?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Joe Sack SQLskills Old Hand Points: 314 More actions · Answer 3

Hi Grant,

My test used an identical TDE database backup file each time - so I think another interesting test would include backups with X percent of changes to the data. But yes - the results were unexpected.

Thanks,

Joe

Joe Sack SQLskills Old Hand Points: 314 More actions · Answer 4

Thanks Phil!

Agreed. And I anticipate seeing this feature applied inappropriately (when it comes to SQL Server). I'll be keeping an eye on this area.

There are promising aspects, but when backups are involved I tread carefully.

Best Regards,

Joe

Ron Klimaszewski SSC Eights! Points: 911 More actions · Answer 5

Nice article, but a better test would have been to create ten databases, each with slightly different data. I suspect the dedupe ratio would be drastically reduced for the compressed backups as the files would be significantly different.

Joe Sack SQLskills Old Hand Points: 314 More actions · Answer 6

Thanks Ron.

My thought behind this first article was to see how well it could do with the "blue sky" scenario. The next round of testing should include data modification changes by "X" percent - seeing what impact that has.

jasona.work SSC Guru Points: 50083 More actions · Answer 7

Grant Fritchey (2/8/2013)
I'm surprised you were getting dedup through the encrypted databases. Really surprised. I figured that would work more like trying to compress them (where the size sometimes goes UP). Any guesses as to why that might be?

My understanding of how the dedupe works in Server2012 is that it is looking for matching "blocks" within files, not "compressing" the files.

So even though the dedupe process can't read the actual data in a TDE (or other encrypted) file, it can "see" the block on disk and check if it matches another block. If they match, it sets up a pointer to one of the blocks for both files and calls it done.

Here we go, the Technet overview The process is reading "chunks" of files and looking for duplicates.

And another Technet Blog posting about it

What's also nice about it, is you can exclude file types from deduplication, so you could have this on your SQL data drive, exclude mdf / ldf / ndf, and go...

Not that I'd try that...

Jason

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 8

Sure, but I wasn't aware that the encryption process would actually make consistent output pages that could be deduped. It implies that reverse engineering of the encryption process is just a matter of time, not effort. I expected more of a "random" output, and yes, I know, not actually random, at all, but not that utterly consistent. Just a surprise.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

jasona.work SSC Guru Points: 50083 More actions · Answer 9

True, having sufficient duplicate chunks to dedupe an encrypted file isn't reassuring. But, Joe was just using the same DB and backing it up multiple times with no activity.

I would think that in a "real-world" situation, the space savings would be less with encrypted backups. You'd still get some, simply because the odds are that there will be duplicate chunks, even if the chunks are for completely different parts of a file or completely different files.

After all, if you've got two files, lets say one a Word document and one an SQL backup file, and they both contain a duplicate sequence, deduplication could work with that.

Jason