Do you know much about data compression? It seems that quite a few people I meet don’t really understand how it works, or what the benefits are. I heard about compression first hand from Sunil Agarwal a few years back at TechEd. I visited one of his sessions and then he was gracious enough to spend about 15-20 minutes with me afterwards answering more questions about additional details. At the time, I was just another schmuck in the audience, but Sunil was very pleasant and patient with me.
I was lucky that day,but Sunil has a video where he explains some of the ways in which compression works for Microsoft customers.
The summary is that data compression works in a few ways. One is, obviously, to save space. The second, however, is that fewer I/Os’s are required to move data on or off a disk, which can result in a performance increase. The SQL Server team tried to strike a balance in how they implemented compression to allow for both of these benefits. The tradeoff is that CPU is needed to compress or decompress data, so while more compression is possible, it can negatively impact performance.
Compression is a cool feature, and consists of a few parts. You can read more about each of them if you are interested:
- Row compression
- Prefix compression (part of page compression)
- Dictionary compression (also part of page compression)
- Unicode Compression (SQL Server 2008 R2 only)
The DBA has complete control over what is compressed, but it is also some work to analyze the space savings and then make a determination about whether or not it is worth you compressing your table. If the space savings is too small, you might end up negating any storage gains with additional CPU costs.
How Do I Decide?
There is a stored procedure (sp_estimate_data_compression_savings) that can be used on tables or indexes to help you determine what the savings are. Numerous scripts are available, like this one from Paul Neilsen – Whole Database – Data Compression Procs.
There’s also one easy decision. Only Enterprise and Data Center editions of SQL Server support this feature, so if you aren’t running those, you don’t have to worry about this.
Alternatives
There are a couple other alternatives to implementing data compression. If you want to get compression savings in other editions of SQL Server, my company, Red Gate Software, makes a product called SQL Storage Compress, which implements compression at the file level.
This is based on Hyperbac technology and is completely transparent to SQL Server. We are working on some extensive case studies right now and this product is part of the SQL Server I/O Reliability Program. This can help performance on your system by reducing I/O.
If space is an issue in test, customer service, and development environments, you can use Virtual Restore to mount a backup file as a database that you can read, or write to. The writes occur in a sparse file, so your backup file is untouched. However you get a quicker “restore” that is transparent to SQL Server.
As I mentioned, I work for Red Gate software, so take these as mentions of products my company builds and sells. I’ve heard great things about them, and use them in a few places for work at SQLServerCentral. I’d recommend you try them and see if they are a fit in your environment and have a good price/value ratio for your company.
Filed under: Blog Tagged: sql server, syndicated
![]()



Subscribe to this blog
Briefcase
Print
Posted by Jason Brimhall on 22 June 2011
I agree. Compression is a cool feature. Thanks for covering this topic and types of compression.
Posted by Steve Jones on 22 June 2011
Thanks, it's an interesting feature. I'll do a few more posts on the details.
Posted by Jack Vamvas on 23 June 2011
In your opinion - are the compression features more suited to Datawarehouse or OLTP type databases?I know this is a broad question.
Posted by Steve Jones on 23 June 2011
That is a broad question. Compression is most suited to repetitive data, but on the same row or page. Data warehouses often have this kind of data, with lots of values stored on the same page in things like fact tables. Wild distributions of values lower the value of compression, which is typically what you might find in OLTP. That depends on your app and data, however.
The compression helps with space, and with I/O, so if you are reading/writing the same data a lot, it can help, but if you are moving around the disk a lot to new places, with low kbs of reads at each stop, the CPU used to decompress/compress might be more of an issue than the I/O savings.