Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Data Compression Double Take

By Tony Davis,

Most people have a broad idea of how data compression works; to oversimplify, a lossless compression algorithm analyses the binary-encoded data for repeating units, and their frequency, and each repeating unit is represented by a single, highly-efficient encoding, using a very small number of bits, so reducing redundant data storage and the size of the file.

A few years back, it was unthinkable to send a Word .doc file over email without first zipping it up. With the advent of .docx, this practice is redundant; since it is already stored in a compressed format there is no benefit in applying double compression (same argument for PDF and other compressed file formats). In fact, it's even common in such cases to see compressed file sizes slightly bigger than the original, since additional header information and so on is added to the compressed file.

The appliance of compression technology to database files is a relative innovation in the SQL Server world, with SQL 2008 Enterprise. Here, we don't compress the whole file, but individual tables and indexes within. Row compression frees up empty space mainly by compressing variable length data types. Page compression adds in prefix and dictionary compression which essentially, as described above, eliminates space by removing redundant storage of repeating units of data.

Clearly, the process of compressing (and decompressing) data will burn additional CPU cycles, but the benefits for query performance lie in the potential for greatly reduced I/O, due to the smaller footprint both on disk and in memory, since the data remains compressed in the buffer cache. Having said this, I confess to being somewhat surprised by the relative lack of "real", published performance data that has emerged over four years, to back this up.

The ever-reliable Linchi Shea took an early look, and proved both the huge potential performance boost it could offer for table scans, the size of the negative impact on data modifications, and the vastly differing compression ratios that could be seen for the same data, depending on its distribution. The latter case, unless you have CPU cycles to burn (which some modern, multi-core processors do!), proves that you should choose carefully which objects to compress; if your index happens to organize data in such a way that there is little repetition on many of the pages for that object, then there really isn't much to compress.

However, in particular, I still find myself wondering about the impact of this data compression on, for example, multi-table JOIN queries. Research exists to suggest that most types of join will proceed quite happily on compressed data and even with some performance benefit, but it would be reassuring to have people share some real world data.

Finally, with a Red Gate hat on for a moment, I wonder if databases could be one exception to the rule that "double compression does no good"; in other words, whether we might benefit from compressing at the file as well as the object level. A tool like SQL Storage Compress (based on Hyperbac technology) works at the file driver level and compresses the whole database file; it compresses and decompresses the data as it passes data to and from SQL Server (so the data is not compressed in the cache in this case). In his initial study, Brad McGehee saw a huge space saving, over native page compression, mainly due to the fact that Storage Compress can also remove all unused space in the data file. It will also compress LOB data (varchar(max) and such), which native data compression will not.

I want to hear from anyone who has direct experience with any of these data compression technologies, good or bad; if you have performance data to share, even better!

Cheers,

Tony.

Total article views: 144 | Views in the last 30 days: 3
 
Related Articles
ARTICLE

SQL Server Central Webinar Series #20: Using SQL Storage Compress in the Real World

Join us on July 10 at 4:00pm GMT/11:00am EDT for an in-depth look at SQL Storage Compress and learn ...

ARTICLE

Improving Database Design and Performance Using Compression

This article attempts to show a practical scenario on improving DB design and performance through ro...

ARTICLE

Webinar: Using SQL Storage Compress by Steve Jones and Brad McGehee

Steve Jones and Brad McGehee lead you through real-world examples of working with SQL Storage Compre...

BLOG

Decompressing on Compression in SQL Server 2008....

Compression is one of the major features introduced in SQL Server 2008, and one that can significant...

FORUM

restore failed: 1130(Not Enoughserver storage space)

restore failed: 1130(Not Enoughserver storage space)

Tags
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones