SQLServerCentral Article

Using CLR integration to compress BLOBs/CLOBs in SQL Server 2005

,

Introduction

Manipulating Binary Large Objects (BLOBs) and Character Large Objects (CLOBs) has always been difficult in SQL Server. The new SQL Server 2005 provides new data types (NVARCHAR(MAX), VARCHAR(MAX) and VARBINARY(MAX)) for large object storage (up to 2 GB) allowing better manipulation as well as the ability to process these data types using CLR procedures and functions.

This article shows how to create CLR functions to seamlessly compress and decompress large data objects with minimum performance impact using compression algorithms included in.NET Framework 2.0. Code samples included in this article can be used in any database implementation but do not cover all possible scenarios. For large implementations or mission critical applications consider using third party products like SQLCompres.NET (it is free).

The new data types

SQL Server 2005 provides three new data types to store and manipulate large objects. VARCHAR(MAX) can store CLOBs, NVARCHAR(MAX) does the same but allows Unicode characters and VARBINARY(MAX) can store BLOBs. Books Online states that max indicates that the maximum storage size is 2^31-1 bytes. These new data types can be used with a wide range of T-SQL commands and behave much like traditional VARBINARY(n), VARCHAR(n) and NVARCHAR(N).

The new data types should replace the TEXT and IMAGE types from previous versions. As per SQL Server Books Online TEXT and IMAGE columns should not be used in new development and legacy applications should be changed to use the new types.

These new types, as opposed to TEXT and IMAGE, can be used as variables and function parameters and can be returned by CLR (or T-SQL) scalar-value functions. These new characteristics make them great candidates for compression. Previous attempts to add CLOB and BLOB compression to SQL Server involved using extended procedures, a difficult and risky business. Using the CLR integration capabilities introduced with SQL Server 2005 makes such implementation more secure and stable (and sometimes faster than their extended procedure counterparts).

CLRs procedures and functions parameters can receive and process these data types as SQLChars and SQLBytes. SQLChars can be used to pass VARCHAR(MAX) and NVARCHAR(MAX) parameter while SQLBytes is used to pass VARBINARY(MAX) types. CLR uses only Unicode characters and passing VARCHAR(MAX) as a parameter implies converting it to Unicode before parameters are passed.

Get the code here:

Compressing a BLOB

SQL Server 2005 allows CLR code to be registered as functions and stored procedure. Developers can now extend SQL Servers functionality using a broad array of programming languages from managed C++ to Visual Basic. How the CLR is hosted inside SQL Server goes beyond the scope of this article. For those who worry about enabling CLR integration, suffice to say that Microsoft has made a conscious effort to keep this integration as safe and secure as possible.

Lets use VARBINARY(MAX) for simplicity sake, since data can be converted between BLOB and CLOB types this articles code can be extended using T-SQL functions. Listing 1 contains a function to compress BLOBs in SQL server, the function receives a VARBINARY(MAX) or IMAGE as a SQLBytes and compresses it using the DeflateStream class provided in .NET Framework 2.0. SQLBytes represents a mutable type that wraps either an array or a stream. We are going to assume it wraps an array to avoid complicated code, and get the data from the Buffer property. Using this property might fail with larger BLOBs causing the CLR to throw an out of memory exception (but dont worry, unlike extended procedures errors, CLR exceptions should not crash your SQL Server).

Listing 1: Compression function

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

using System.IO;
using System.IO.Compression;

public partial class UserDefinedFunctions
{
    // Setting function characteristics
    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic=true,DataAccess=DataAccessKind.None)]
    public static SqlBytes fn_compress(SqlBytes blob)
    {
        if (blob.IsNull)
            return blob;
        // Retrieving BLOB data
        byte[] blobData = blob.Buffer;
 
        // Preparing for compression
        MemoryStream compressedData = new MemoryStream();
        DeflateStream compressor = new DeflateStream(compressedData, CompressionMode.Compress, true);
        // Writting uncompressed data using a DeflateStream compressor
        compressor.Write(blobData, 0, blobData.Length);
        // Clossing compressor to allow ALL compressed bytes to be written
        compressor.Flush();
        compressor.Close();
        compressor = null;
        // Returning compressed blob
        return new SqlBytes(compressedData);
    }
};

Compressing a BLOB in SQL Server 2005 is as easy as passing a SQLBytes parameter, reading its content and writing it to a compression stream. The compression stream writes to a MemoryStream that is later used to create a new SQLBytes object that can be returned to SQL Server applications or directly to the client. There is only one caveat: Microsofts implementation of DeflateStream requires the stream to be closed before it writes the last compressed bytes, flushing is not enough.

Listing 2 loads an assembly in SQL Server (a process called cataloging where assemblies are verified for security and reliability) and creates the compression function. Listing 3 shows how to use the function in a T-SQL update. This usage makes compressing columns a seamless process that would require only server side adjustments.

Listing 2: Loading assembly and creating CLR function

CREATE ASSEMBLY [BlobCompression]
                  FROM 'D:\Development\BlobCompression.Dll'
                  WITH PERMISSION_SET = SAFE
CREATE FUNCTION [fn_decompress]           (
                 @compressedBlob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME [BlobCompression].[UserDefinedFunctions].[fn_decompress];

Listing 3: Compressing data

create table #temp (
      blob_col    varbinary(max));
insert into #temp
values(convert(varbinary(max), 'To run your project, please edit the Test.sql file in your project. This file is located in the Test Scripts folder in the Solution Explorer.'));
drop table #temp;

Decompressing a BLOB

Listing 4 contains a function to decompress a BLOB. This function follows the same principles used in compression but now reads from a stream returning a decompressed block that can be used to create and return a decompressed SQLBytes object.

Listing 4: Decompression function

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;
using System.IO.Compression;
public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)]
    public static SqlBytes fn_decompress(SqlBytes compressedBlob)
    {
        if (compressedBlob.IsNull)
            return compressedBlob;
        // Preparing to read data from compressed stream
        DeflateStream decompressor = new DeflateStream(compressedBlob.Stream, CompressionMode.Decompress, true);
        // Initializing variables
        int bytesRead = 1;
        int chunkSize = 10000;
        byte[] chunk = new byte[chunkSize];
        // Preparing destination stream to hold decompressed data
        MemoryStream decompressedData = new MemoryStream();
        try
        {
            // Reading from compressed stream 
            while ((bytesRead = decompressor.Read(chunk, 0, chunkSize)) > 0)
            {
                // Writting decompressed data
                decompressedData.Write(chunk, 0, bytesRead); 
            }
        }
        catch (Exception)
        {
            throw; // Nothing to do...
        }
        finally
        {
            // Cleaning up
            decompressor.Close();
            decompressor = null;
        }
        // Returning a decompressed BLOB
        return new SqlBytes(decompressedData);
    }
};

Listing 5 loads an assembly and creates a compression and decompression CLR function. Compression and decompression can be tested using listing 6, it creates a table and add some values to it, a compression update is run followed by a select statement that returns uncompressed data.

Listing 5: Loading assembly and creating CLR functions

CREATE ASSEMBLY [BlobCompression]
                  FROM 'D:\Development\BlobCompression.Dll'
                  WITH PERMISSION_SET = SAFE
CREATE FUNCTION [fn_decompress]           (
                  @compressedBlob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME [BlobCompression].[UserDefinedFunctions].[fn_decompress];
CREATE FUNCTION [fn_compress](
                  @blob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME [BlobCompression].[UserDefinedFunctions].[fn_compress];

Listing 6: Testing functionality

create table #temp (
      blob_col    varbinary(max));
    
insert into #temp
values(convert(varbinary(max), 'To run your project, please edit the Test.sql file in your project. This file is located in the Test Scripts folder in the Solution Explorer.'));
update #temp
set blob_col = master.dbo.fn_compress(blob_col);
select convert(varchar(1000), master.dbo.fn_decompress(blob_col))
from #temp;
drop table #temp;

Limitations

The code included in this article allows column level compression in SQL Server 2005 but it lacks functions for consistency check and will not work very well with large objects (5 MB or more depending on configuration). It is intended to show how to use CLR integration in SQL Server to extend the engines functionality and provides an overview on what can be done with the new BLOB/CLOB data types.

Rate

4.67 (3)

You rated this post out of 5. Change rating

Share

Share

Rate

4.67 (3)

You rated this post out of 5. Change rating