Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

A Collaborative Approach To Table Compression Expand / Collapse
Author
Message
Posted Thursday, January 13, 2011 2:23 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 10:43 PM
Points: 27, Visits: 200
Compression does not work well on any data stream < 150 bytes (+/-)
For most techniques the first 100 bytes, or so, are used to store header information to reverse the compression.

Given this ROW compression should be reserved for tables with long row datalengths.
But before considering this approach there are several ways to reduce the 'byte-count' without compression (varchar instead of nvarchar etc). Compressing 'blob' data before it enters the database.
As always, it's about getting extra row(s) into an individual page or reducing the number of pages taken by an single row.

The biggest drawback here occurs during a page-split. The page needs to be reconstitued - every row expanded, the split made, then every row re-compressed The re-compression doesn't happen until the next write, so there could be some delay before you see your performance hit.

PAGE compression on the other hand is a 'what have we got to loose' option. The 8K page is compressed (an awful lot of them) when it is written, and may never need to be expanded again.

The page-split is still an issue, but an 8K decompress is a minor (and known) performance hit. You will run into performace problems if your tables/indexes have high page-split metrics.

I've touched on it, but the important thing to note is that reading compressed data does not cause it to rebuild the 8K pages (MS doco). So you don't just save space on disk, you get more pages into memory, improving the performance of your buffer caching. So unless you write to a compressed page, your already ahead.

A final note.
The compression itself doesn't occur until the write-to-disk. So finding out if this is going cause grief can only be tested under (heavy) data-update load. Take any performance advice with a grain of salt, your situation will be different - and there is only one way for you to find out.
Post #1047583
Posted Thursday, January 13, 2011 3:05 PM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 6:44 AM
Points: 81, Visits: 369
Just to make sure we are all on same page, the way I've heard the Microsoft CAT engineers explain for row and page compression that the data is compressed both on the hard drive and in the data buffer. Only when the actual row is moved to the CPU is the row (or page) uncompressed (and recompressed if updated or inserted) -- hence the additional CPU load. Reviewing my PASS2009 notes, from the DBA409 (Customer Experiences With Data Compression) briefing by SQLCAT engineers, their recommendation was that if row compression results in space savings, and 10% increase in CPU usage can be accommodated, all data should use row. They go on to urge careful consideration for Page compression.

Mike Byrd
Post #1047601
Posted Thursday, January 13, 2011 3:29 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, August 6, 2014 7:55 AM
Points: 43, Visits: 461
The opening of your article refers to increased performance, but I don't see any reference to performance measures?

You should find that depending on your hardware and performance requirement that the benefits of row compression will vary considerably. Systems with a smaller number of cores could actually suffer reduced query efficiency with row compression on small to medium size tables.
Post #1047624
Posted Thursday, January 13, 2011 3:59 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 11:36 AM
Points: 866, Visits: 2,376
Gary has an excellent point; does anyone have the time to run a stable, known workload under uncompressed, page compressed, and row compressed conditions with a varying number of cores (varying CPU affinity settings)?

I'd suggest collecting SQL:BatchCompleted Profiler event statistics for CPU, Reads, Writes, and Duration, so we can see the tradeoffs that were made internally.
Post #1047642
Posted Friday, January 14, 2011 1:34 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, August 6, 2014 7:55 AM
Points: 43, Visits: 461
Excellent use of the phrase "trade offs". Because effectively all you are doing is trading resources in a similar way to how a financial portfolio maybe managed. You're buying disk space and selling CPU (and probably a bit of RAM too to hold the compress and uncompress versions).

It's great to have the ability to do this resource trading but it's important to realize that nothing comes free. The only way to justify the resource trade is to determine and measure the benefits against you're business requirements. It may seem like great news to have released so much disk space, but disk is relatively cheap compared to other hardware resources.

If performance is your goal then compression on very large tables can help considerably. If you dig deep into MPP appliances like Netezza you will find that part of the way they achieve staggering performance is by using compression algorithms close to disk.
Post #1047765
Posted Friday, January 14, 2011 1:34 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, August 6, 2014 7:55 AM
Points: 43, Visits: 461
Excellent use of the phrase "trade offs". Because effectively all you are doing is trading resources in a similar way to how a financial portfolio maybe managed. You're buying disk space and selling CPU (and probably a bit of RAM too to hold the compress and uncompress versions).

It's great to have the ability to do this resource trading but it's important to realize that nothing comes free. The only way to justify the resource trade is to determine and measure the benefits against you're business requirements. It may seem like great news to have released so much disk space, but disk is relatively cheap compared to other hardware resources.

If performance is your goal then compression on very large tables can help considerably. If you dig deep into MPP appliances like Netezza you will find that part of the way they achieve staggering performance is by using compression algorithms close to disk.
Post #1047766
Posted Tuesday, January 18, 2011 9:48 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, August 22, 2011 1:46 PM
Points: 16, Visits: 114
I am currently conducting some tests that look at the result of uncompressed vs. page vs. row. The results are not final yet, but only a handful of tables compressed less than 50% out of my 54 table sample. I am impressed overall with the results at this level. It is important to point out two things about this preliminary test: 1. the overall savings at the database level was not as much as I expected, and, 2. when I compared before and after, I was careful only to compare data pages consumed [not reserved]. So my numbers looked great at the table level, but the overall savings was not what I had hoped for.

A few other observations...

My earlier testing indicates that the proc for estimating compression is not terribly reliable.

I believe that compression need only be applied to: 1) data warehouses, 2) tables larger than one sector read [typically 64 to 128 8k pages].

Post #1049490
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse