I would like this paper to be a collaborative approach to attaining an end result - improved query performance. I am going to propose an approach to how to identify tables for compression and the resulting SQL statements to implement that compression. What I would from the rest of the community is your thoughts and ideas about this approach. I strongly profess to not knowing all the ins and outs, but offer this approach as a starting point for a very interactive discussion.
Obviously if you are not working with SQL Server 2008 Enterprise/Developer version (or higher) then table compression is not an option for you. The approach below may be simplistic in its design and implementation, but does offer a starting point.
While at the 2010 PASS Summit in Seattle, I attended a couple sessions on table compression and also talked with a several Microsoft Engineers. My "take" from this was that you should initially only look at the tables for compression and then afterwards attempt to identify indexes as possible candidates. The approach below is to consider only tables greater than 8 pages with potential compression of 20% or better. One cool stored procedure that is embedded in SQL Server 2008 (and R2) is sp_estimate_data_compression_savings. You can use Google or Bing to find out more about this procedure, but in brief it will sample a database object and return potential compression saving for row and page compression.
Consider the script below (yes, I know cursors are bad):
IF OBJECT_ID(N'tempdb..#Results') IS NOT NULL
DROP TABLE #Results
GO
DECLARE @SQL NVARCHAR(4000);
DECLARE @Schema SYSNAME;
DECLARE @Table SYSNAME;
DECLARE @PartitionNumber INT;
CREATE TABLE #Results (
[Table] SYSNAME,
[Schema] SYSNAME,
IndexID INT,
PartitionNumber INT,
[CurrentSize(kb)] INT,
[CompressedSize(kb)] INT,
[SampleCurrentSize(kb)] INT,
[SampleCompressedSize(kb)] INT) DECLARE TableCursor CURSOR FOR
select s.name, o.name
FROM sys.objects o
JOIN sys.schemas s
ON s.schema_id = o.schema_id
WHERE o.[type] = 'U'
ORDER BY 1,2 SET NOCOUNT ON;
OPEN TableCursor;
FETCH NEXT FROM TableCursor INTO @Schema,@Table
WHILE @@FETCH_STATUS = 0
BEGIN
SET @SQL = '
INSERT INTO #Results
([Table],[Schema],IndexID,PartitionNumber,[CurrentSize(kb)],[CompressedSize(kb)],[SampleCurrentSize(kb)],[SampleCompressedSize(kb)])
EXEC sp_estimate_data_compression_savings ''' + @Schema + ''',''' + @Table + ''',NULL,NULL,''ROW'';';
--PRINT @SQL;
EXEC sp_executeSQL @SQL;
FETCH NEXT FROM TableCursor INTO @Schema,@Table;
END;
CLOSE TableCursor;
DEALLOCATE TableCursor; --SELECT [Schema], [Table], IndexID, PartitionNumber, [CurrentSize(kb)],CONVERT(NUMERIC(12,2), ROUND(CASE WHEN [CurrentSize(kb)] = 0 THEN 100 ELSE [CompressedSize(kb)] * 100. / [CurrentSize(kb)] END,2)) AS [Compression], [CompressedSize(kb)]
DECLARE CompressionCursor CURSOR FOR
SELECT [Schema], [Table], MAX(PartitionNumber) PartitionNumber
FROM #Results r
WHERE (IndexID = 0
OR IndexID = 1 AND NOT EXISTS (Select 'x' FROM #Results r2 where r.[Schema] = r2.[Schema] and r.[Table] = r2.[Table] and r2.IndexID = 0))
AND (ROUND(CASE WHEN [CurrentSize(kb)] = 0 THEN 100 ELSE [CompressedSize(kb)] * 100. / [CurrentSize(kb)] END,2) BETWEEN 0.01 AND 80.00)
AND [CurrentSize(kb)] > 64
GROUP BY [Schema], [Table]
ORDER BY 1,2; OPEN CompressionCursor; FETCH NEXT FROM CompressionCursor INTO @Schema, @Table, @PartitionNumber;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @SQL = CASE WHEN EXISTS (SELECT * FROM #Results WHERE [Schema] = @Schema AND [Table] = @Table AND PartitionNumber > 1) THEN 'ALTER TABLE ' + @Schema + '.' + @Table + ' REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW,ONLINE = ON);'
ELSE 'ALTER TABLE ' + @Schema + '.' + @Table + ' REBUILD WITH (DATA_COMPRESSION = ROW,ONLINE = ON);' END;
PRINT @SQL;
-- EXEC sp_executeSQL @SQL;
FETCH NEXT FROM CompressionCursor INTO @Schema,@Table,@PartitionNumber;
END;
CLOSE CompressionCursor;
DEALLOCATE CompressionCursor;
GO
Initially a cursor is implemented to get all the user-defined table objects in the appropriate database. Then that cursor is used to call the sp_estimate_table_compression_savings stored procedure for each table and the results stored in a temp table #results. Note there is logic implemented to identify partitioned tables.
The data in #results is then used to derive the ALTER TABLE SQL statements needed to compress the appropriate tables. I wrote the script above to only assign row compression; this seemed to be the general consensus at the PASS Conference for OLTP tables where the data is rapidly changing. Lookup tables (and potentially data warehouse tables) are candidates for page compression, and that logic can be implemented in an evolution of this script.
I also commented out the actual execution of the ALTER TABLE SQL. After a fair amount of testing I finally decided that the human eye and brain was necessary to cull over the results before just automating the entire process. A partial listing of the results of the script are shown below:
...
ALTER TABLE dbo.AuditLog REBUILD WITH (DATA_COMPRESSION = ROW,ONLINE = ON);
ALTER TABLE dbo.Barcode REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW,ONLINE = ON);
ALTER TABLE dbo.Batch REBUILD WITH (DATA_COMPRESSION = ROW,ONLINE = ON);
...
The syntax for partitioned tables is slightly different than non-partitioned tables. In this first version of the script I elected to use row compression for all partitions. Since the tables are partitioned very closely to row creation date I could have elected to add logic to use page compression on the "older" partitions, but I will save that for a later date.
My current employer is very cautious about changes to the database and I think the above approach is about all I can sell for the initial implementation. Your comments and ideas are most welcome.