Blog Post

Data Compression

,

We have come to that time of month again – blog party.  This time, the party is hosted by Jorge Segarra (aka SQLChicken) (Blog | Twitter).  The rules have also changed this month, though ever so slightly.  Now, we must insert a pic into our blog post rather than include the TSQL Tuesday in our blog title.  BTW, this month we are discussing features in SQL 2008 (supposed to be R2, but I am only discussing something that came out with 2008).

Pick your Poison…err Feature.

The feature that I have chosen is compression.  In SQL 2008 we have the option to compress backups as well as compress the data.  The data can be compressed in two methods as well.  I will just be discussing my experience with Page level compression.  I will touch lightly on the differences between row level and page level compression.

History

Do you recall a nifty utility that Microsoft gave us back in the glory days called Doublespace that was later renamed to Drivespace?  I do!!  Oh the memories are painful still.  That little compression agent was renamed due to the inaccuracy of the name.  You didn’t truly get double the space on your hard drive by using it.  I remember numerous support calls related to compression and all of them turned out ugly.  Something about compressing your drive and then losing everything because you used a Drive Overlay to access a larger disk drive than the BIOS supported and then used Doublespace to compress it.  Or another good one was to lose the doublespace bin files from a compressed drive.  You could also see heavy fragmentation issues.  All of these have created a heavy bias for me against mass compression utilities.

Ch Ch Ch Changes

Despite my heavy bias against compression, I have always like the ability to compress selectively certain files or folders.  The reasons for compressing in this method, for me, have largely been for archival purposes.  There are several file compression utilities out there on the market for use in performing this.

How does that relate to database compression?  I see database compression, as offered with SQL 2008, to be more like these file compression utilities than DriveSpace.  Data compression in SQL 2008 is not an all or none implementation.  You get to pick and choose what gets compressed.  That is a big time bonus for me.

The Setup

After some research and having learned a bit about compression, I decided to test it out.  I have yet to test performance as I have only tested the disk savings that compressing could generate.  There is a good demonstration on performance by Jason Shadonix here, for those that are interested.  I will be baselining and testing performance at a later time – that is just a bit beyond the scope for this article.

I decided to use a database from our warehouse that resides on SQL 2000, on Windows 2003 32 Bit, currently.  I created a backup of that database and restored it to a test box that is running SQL 2008 on Windows 2008 R2 64 Bit.  The starting database size was 164GB.  The database was also left in SQL 2000 compatibility mode.  The selection criteria for tables to compress was to select any table larger than 1GB in size.  I used a script I showed in the table space series to determine which tables to target.  The script can be found as follows.

DECLARE @dbsize DECIMAL(19,2)
,@logsize DECIMAL(19,2)
 
SET NOCOUNT ON
 
/*
**  Summary data.
*/
BEGIN
SELECT @dbsize = SUM(CONVERT(DECIMAL(19,2),CASE WHEN type = 0 THEN SIZE ELSE 0 END)) * 8/1024
, @logsize = SUM(CONVERT(DECIMAL(19,2),CASE WHEN type = 1 THEN SIZE ELSE 0 END)) * 8/1024
FROM sys.database_files
 
END
/*
**  We want all objects.
*/
BEGIN
WITH FirstPass AS (
SELECT OBJECT_ID,
ReservedPage = CONVERT(DECIMAL(19,2),SUM(reserved_page_count)) * 8/1024,
UsedPage = CONVERT(DECIMAL(19,2),SUM(used_page_count)) *8/1024,
PageCnt = SUM(
CONVERT(DECIMAL(19,2),CASE
WHEN (index_id < 2)
THEN (used_page_count)
ELSE lob_used_page_count + row_overflow_used_page_count
END
)) * 8/1024,
RowCnt = SUM(
CASE
WHEN (index_id < 2)
THEN row_count
ELSE 0
END
)
FROM sys.dm_db_partition_stats
--Where OBJECTPROPERTY(OBJECT_ID,'IsMSShipped') = 0
GROUP BY OBJECT_ID
)
,InternalTables AS (
SELECT ps.OBJECT_ID,
ReservedPage = CONVERT(DECIMAL(19,2),SUM(reserved_page_count)) * 8/1024,
UsedPage = CONVERT(DECIMAL(19,2),SUM(used_page_count)) *8/1024
FROM sys.dm_db_partition_stats  ps
INNER Join sys.internal_tables it
ON it.OBJECT_ID = ps.OBJECT_ID
And it.internal_type IN (202,204,211,212,213,214,215,216)
WHERE it.parent_id = ps.OBJECT_ID
--And OBJECTPROPERTY(ps.OBJECT_ID,'IsMSShipped') = 0
GROUP BY ps.OBJECT_ID
)
,Summary AS (
SELECT
ObjName = OBJECT_NAME (f.OBJECT_ID),
NumRows = MAX(f.rowcnt),
ReservedPageMB = SUM(IsNull(f.reservedpage,0) + IsNull(i.ReservedPage,0)),
DataSizeMB = SUM(f.PageCnt),
IndexSizeMB = SUM(CASE WHEN (f.UsedPage + IsNull(i.UsedPage,0)) > f.PageCnt
THEN ((f.UsedPage + IsNull(i.UsedPage,0)) - f.PageCnt) ELSE 0 END) ,-- Equivalent of max_record_size from sys.dm_db_index_physical_stats
UnusedSpace = SUM(CASE WHEN (f.ReservedPage + IsNull(i.ReservedPage,0)) > (f.UsedPage + IsNull(i.UsedPage,0))
THEN ((f.ReservedPage + IsNull(i.ReservedPage,0)) - (f.UsedPage + IsNull(i.UsedPage,0))) ELSE 0 END),
DBSizeMB = @Dbsize,
LogSizeMB = @logsize
FROM FirstPass F
LEFT Outer Join InternalTables i
ON i.OBJECT_ID = f.OBJECT_ID
GROUP BY f.OBJECT_ID
)
SELECT ObjName,NumRows, ReservedPageMB, DataSizeMB, IndexSizeMB, UnusedSpace, DBSizeMB, LogSizeMB,
PercentofDB = ((IndexSizeMb + DataSizeMB) / @DBsize) * 100
INTO #LargeTables2Compress
FROM Summary
ORDER BY PercentofDB DESC
END

In this script, you will note that I am dumping the results into a temp table.  I wanted to be able to run a few different cases against the data to check results quickly without running the entire query over again.  Though, running this particular query would not take that long in most cases.  Consider the temp table a fail-safe to prevent long execution times of the query.

To determine which tables from the previous query I would target, I employed a script similar to the following.

SELECT 'Alter Table ' + objname +' rebuild with (Data_Compression = Page);'
FROM #LargeTables2Compress
WHERE DataSizeMB > 1000

This is pretty straight-forward.  I am only selecting those tables from the temp table that are larger than 1 GB.  I am also concatenating a string to those tables for manual execution purposes.  I can now see which tables are “large” and compress them with Page level compression or not (note: the same sort of script would work for Row Level compression).  After, I was happy with the result set and which tables would be compressed, I proceeded with compressing the tables.  Recall that the database started at 164GB.  After compression of the handful of tables (30 versus the 460 tables in the database), the database used space was now down to 84GB.  That is a reduction of ~49% in space requirements for this database.  For me, that is a pleasing result.

What if…

Should you decided to run this process again, the tables that you have already compressed will be included in the result set.  Should you proceed with running the table alter scripts again, you will not get any better compression and the compression will remain in effect.  Or, you could eliminate those tables that are compressed by altering the script provided to also read from sys.partitions as in the following script.

SELECT 'Alter Table ' + objname +' rebuild with (Data_Compression = Page);'
FROM #LargeTables2Compress
WHERE DataSizeMB > 1000
And objname in
(SELECT OBJECT_NAME(p.OBJECT_ID) FROM sys.partitions p
WHERE p.data_compression =0
And OBJECTPROPERTY(p.OBJECT_ID,'ISMSShipped') = 0
)

Conclusion

Without hard and fast numbers concerning performance, a final conclusion could not be made concerning whether to compress or not.  Having run a few dry runs to test if processes still worked, I am pleased with the performance (it is faster – I don’t have the final data to back that now though).  The disk savings and even with a stalemate on performance, I like the compression that is offered in SQL 2008.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating