A Collaborative Approach To Table Compression

Question

A Collaborative Approach To Table Compression

Mike Byrd

Default port

Points: 1458
More actions
January 12, 2011 at 8:11 pm

#232193

Comments posted to this topic are about the item A Collaborative Approach To Table Compression
Mike Byrd

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Craig Lucas SSC Veteran Points: 242 More actions · Answer 1

did you do any performance comparison between row/page compression? I, like you, thought about automating compression and I'm trying it out on a reporting environment. I put a ddl trigger in place that automatically compresses tables with page compression. Granted that might not be the "ideal" compression for a table, but its a start. Also, this server has 32 cpus and 256gb of ram and I am not strapped for CPU. I am more concerned with conserving space.

The server has thousands of very wide/large replicated tables so the compression helps a ton with disk space and performance. I never did a benchmark on row vs page though. And I dont compress indexes

Christoph D Say Hey Kid Points: 694 More actions · Answer 2

I can recommend this document

http://msdn.microsoft.com/en-us/library/dd894051(v=sql.100).aspx

because there are a few good hints about when to use which compressiontype. I am sure you can write a script to do it automatically but if you don't have a few hundred tables up and running 24-7 you should do them manually. You won't do it that often to change the compression and you can keep an eye on the growth of the logfile, especially if you have mirroring and replication in place.

I haven't used many compressed tables in a live-system at the moment but my researches are that far that page-compression is pretty nice but much more expensive so I would only use it on archive-tables or tables where there is a big difference between page-compression and row-compression, as well as fact-tables in a BI-environment.

David.Poole SSC Guru Points: 75939 More actions · Answer 3

I don't think it is worth compressing small lookup tables. In fact I noticed that a database containing nothing but small (<100 record) lookup tables actually grew.

Does the concept of an SGAM page exist for compressed data?

LinkedIn Profile

Mike Byrd Default port Points: 1458 More actions · Answer 4

Yes I did look at page compression, but the community consensus is that you should only do page compression for fairly static tables. If you have to update or insert a row in a page that is page compressed, then the CPU has to uncompress all the rows in that page and recompress the new page -- in an active system that would probably cost you performance. Page compression appears to be best utilized in lookup tables or tables with minimal write activity. Page compression would also be a player for a partitioned table where the "older" partitions could be page compressed (assuming low write activity) and the "newer" partitions could be row compressed. I've looked at this for several months and have yet to come up with a robust (read not fragile) means of maintaining the differentiation between older and newer partitions in a sliding window environment.

Mike Byrd

Mike Byrd Default port Points: 1458 More actions · Answer 5

You are correct; compressing small lookup tables actually may increase the overall size. That is why I exclude any tables less than 8 pages in my script. I am not aware of an SGAM page for compressed data -- as far as I can tell all the compression algorithm data is embedded in the row (or page) itself.

Mike Byrd

SQLRNNR SSC Guru Points: 281334 More actions · Answer 6

Interesting article. I had seen a couple of articles that showed little to no difference in CPU even for highly active tables and that overall page was faster. Now I will have to go back and test some more.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Slick84 SSCertifiable Points: 5602 More actions · Answer 7

Mike,

Once the tables(rows) were compressed on your tables, how much space were you able to conserve? Do you have any estimates? Numbers would help on the usability of this functionality.

Thanks,

S

--
:hehe:

Nadrek SSC-Insane Points: 20039 More actions · Answer 8

I've done some compression experiments, and performance impact varies significantly.

As far as size, 8.1GB of data and 6.2GB of indexes (with plenty of VARCHAR fields, none of which had trailing whitespace) PAGE compressed down to 2.9GB of data and 2.5GB of indexes.

On a procedure that read from nested views leading to PAGE compressed data and indexes, and generated reporting tables with quite a few newly generated indexes based on those nested views, turning on PAGE compression with 20GB of RAM and 16 cores resulted in a somewhat longer duration, somewhat lower reads, significantly lower writes, and, of course, more CPU usage.

FYI: I've definitely seen REBUILDing a PAGE compressed index become CPU-bound on a 16-core box doing nothing else.

Mike Byrd Default port Points: 1458 More actions · Answer 9

Unfortunately I do not have before and after data size numbers fro the row compression on our production OLTP database. But I did use (as documented in the script) sp_estimate_data_compression_savings stored procedure to identity potential tables elgible for row compression (you can also use it for page compression). Examples are indicated below (table names changed to protect the innocent). Of course, the amount of compression is very dependent on the table architecture and also the data within it. However, I was not looking so much for saving disk space as I was for decreasing disk IO (and the attendant performance gains associated with less IO). I guess I'll have to do a future article on potential performance gains from data compression; thanks for pointing this out.

Table IndexID PartitionNumber CurrentSize(kb) CompressedSize(kb) %Compression

AuditLog1 1 68432 47240 31.0

Barcode 1 1 30112 20576 31.7

Barcode 1 2 656592 453528 30.9

Barcode 1 3 707480 488936 30.9

Barcode 1 4 683672 473000 30.8

Barcode 1 5 693344 481232 30.6

Barcode 1 6 755952 525368 30.5

Barcode 1 7 848928 589064 30.6

Barcode 1 8 231976 160808 30.7

Batch 1 1 65184 43872 32.7

USPSPackage 1 1 2705768 2059016 23.9

Mike Byrd

rob.lobbe-964963 Old Hand Points: 307 More actions · Answer 10

Compression does not work well on any data stream < 150 bytes (+/-)

For most techniques the first 100 bytes, or so, are used to store header information to reverse the compression.

Given this ROW compression should be reserved for tables with long row datalengths.

But before considering this approach there are several ways to reduce the 'byte-count' without compression (varchar instead of nvarchar etc). Compressing 'blob' data before it enters the database.

As always, it's about getting extra row(s) into an individual page or reducing the number of pages taken by an single row.

The biggest drawback here occurs during a page-split. The page needs to be reconstitued - every row expanded, the split made, then every row re-compressed The re-compression doesn't happen until the next write, so there could be some delay before you see your performance hit.

PAGE compression on the other hand is a 'what have we got to loose' option. The 8K page is compressed (an awful lot of them) when it is written, and may never need to be expanded again.

The page-split is still an issue, but an 8K decompress is a minor (and known) performance hit. You will run into performace problems if your tables/indexes have high page-split metrics.

I've touched on it, but the important thing to note is that reading compressed data does not cause it to rebuild the 8K pages (MS doco). So you don't just save space on disk, you get more pages into memory, improving the performance of your buffer caching. So unless you write to a compressed page, your already ahead.

A final note.

The compression itself doesn't occur until the write-to-disk. So finding out if this is going cause grief can only be tested under (heavy) data-update load. Take any performance advice with a grain of salt, your situation will be different - and there is only one way for you to find out.

Mike Byrd Default port Points: 1458 More actions · Answer 11

Just to make sure we are all on same page, the way I've heard the Microsoft CAT engineers explain for row and page compression that the data is compressed both on the hard drive and in the data buffer. Only when the actual row is moved to the CPU is the row (or page) uncompressed (and recompressed if updated or inserted) -- hence the additional CPU load. Reviewing my PASS2009 notes, from the DBA409 (Customer Experiences With Data Compression) briefing by SQLCAT engineers, their recommendation was that if row compression results in space savings, and 10% increase in CPU usage can be accommodated, all data should use row. They go on to urge careful consideration for Page compression.

Mike Byrd

gary.strange-sqlconsumer SSCommitted Points: 1816 More actions · Answer 12

The opening of your article refers to increased performance, but I don't see any reference to performance measures?

You should find that depending on your hardware and performance requirement that the benefits of row compression will vary considerably. Systems with a smaller number of cores could actually suffer reduced query efficiency with row compression on small to medium size tables.

Nadrek SSC-Insane Points: 20039 More actions · Answer 13

Gary has an excellent point; does anyone have the time to run a stable, known workload under uncompressed, page compressed, and row compressed conditions with a varying number of cores (varying CPU affinity settings)?

I'd suggest collecting SQL:BatchCompleted Profiler event statistics for CPU, Reads, Writes, and Duration, so we can see the tradeoffs that were made internally.

gary.strange-sqlconsumer SSCommitted Points: 1816 More actions · Answer 14

Excellent use of the phrase "trade offs". Because effectively all you are doing is trading resources in a similar way to how a financial portfolio maybe managed. You're buying disk space and selling CPU (and probably a bit of RAM too to hold the compress and uncompress versions).

It's great to have the ability to do this resource trading but it's important to realize that nothing comes free. The only way to justify the resource trade is to determine and measure the benefits against you're business requirements. It may seem like great news to have released so much disk space, but disk is relatively cheap compared to other hardware resources.

If performance is your goal then compression on very large tables can help considerably. If you dig deep into MPP appliances like Netezza you will find that part of the way they achieve staggering performance is by using compression algorithms close to disk.