Using CLR integration to compress BLOBs/CLOBs in SQL Server 2005

  • I have not tried SQLCompress.NET.  We are less inclined to use smaller third party add ons.  Not that they're bad, but in the end, the corporate red-tape involved in getting them in house and supportable exceeds the cost of building some of those capabilities ourselves... not to mention the general hassle of dealing with "the man".

    Thanks!

  • JoeA (3/5/2007)


    Hi Yoel - I enjoyed your article on blob compression - but I believe there is a bug in the compression function...

    .....I determined that the cause of the problem is that the blobData.Length (the length of the byte array) value is not reliable - it didn't actually match the length of the blob data. It was often 6-8K when the actual data was much smaller, even zero length. If I use blob.Length (the length of the SqlBytes parameter) when writing to the compression stream, I got the results I expected.

    Just wanted to let you and the readers know what I found... please validate as you see fit

    Oh, BTW - We are thinking about using blob compression to reduce our SAN chargeback costs (which are insanely high) - the blob data we're compressing needs to be online with the active data in our application, but isn't updated after it's inserted, and isn't selected very often relative to the rest of the data; so the compute costs are manageble.

    MANY MANY THANKS, JoeA!!!

    I was also hit by this exact bug.

    Compress/Decompress simply corrupted the data... Decompression actually blew it even bigger.

    So this one change fix my issue.

    (SQL 2005 SP2 build 2005.090.3073.00, btw.)

    Also THANKS to Yoel!!

    This solution should actually work well for document archiving (my exact case). The BLOB data is still available to the users, but actual usage of it is low. Thusly, the performance is non-issue at all. But the space savings and availability is a huge benefit.

  • An hour later I have some bad news.

    As soon as I attempted to put a bit of a real load on the SQLCLR compresson and push through some real data, things exploded:

    Msg 6532, Level 16, State 49, Line 1

    .NET Framework execution was aborted by escalation policy because of out of memory.

    System.Threading.ThreadAbortException: Thread was being aborted.

    System.Threading.ThreadAbortException:

    at System.IO.MemoryStream.set_Capacity(Int32 value)

    at System.IO.MemoryStream.EnsureCapacity(Int32 value)

    at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)

    at System.IO.Compression.DeflateStream.InternalWrite(Byte[] array, Int32 offset, Int32 count, Boolean isAsync)

    at System.IO.Compression.DeflateStream.Write(Byte[] array, Int32 offset, Int32 count)

    at SqlClrTools.SqlCLRTools.fn_compress(SqlBytes blob)

    The statement has been terminated.

  • A bit of a background about what I need to do:

    1) need to compress some BLOBs that store binary copies of real documents (sample attached)

    2) in this test run I tried to push through 56 BLOBs where the biggest is about 40MB:

    --uncompressed BLOBs stored here:

    create table _Files (FileName varchar(200), FileSize int, FileContent varbinary(max))

    --load the data here.... (irrelevant)

    --compressed BLOBs will be stored here:

    create table _FilesCompressed (FileName varchar(200), FileSize int, FileContent varbinary(max))

    --an attempt to compress 56 documents blows up.

    insert Into _FilesCompressed (FileName, FileSize, FileContent)

    select f.FileName,

    null,

    dbo.dba_clrCompress(f.FileContent)

    from _Files f

    I will try to compress data by several documents at the time or using a cursor alltoghether and see....

    Any other ideas?

    PS: The ".NET Framework execution was aborted by escalation policy because of out of memory. " seems to be way too common according to the Google... No good.

  • Grasshopper,

    There was a limitation on how much memory you can allocate in a single [font="Courier New"]new byte[] [/font]call in SQL 2005 pre SP2. I don't know if that is the case after SP2 but was one problem with this solution (sorry but CLR integration usually died around 2 MB). Please let me know which CLR SP you are using and which SQL Server SP are you running. If you don't have the red tape issues with free software you could try SQLCompress.NET is free and tested with images around 120 MB. If you have any issues drop a line to support@sqlcompress.net... I think they are about to release the SQL 2005/2008 version under open source.

  • I am on SP2.

    Exact build number posted above.

    Really wanted to avoid 3rd party. But if they post the code, it could be useful.

    Shame that M$ could not just do it right to begin with. This kinds of use cases were easy to predict.

    Thanks.

  • We implemented this technique successfully, but our Blobs were typically in the 10-50K range - never more that 5MB, and most access was via single row (2 or 3 blobs per row). We did test this across several million rows, including multi-row statements, and never saw any issues though our testing was limited to the blob sizes I mention earlier.

  • To be fair, we also can limit size of BLOBs to push through compression.

    Average size of the document in our business case is about 100-300K. Those make up 80-90% of the data to handle. There is also a question of what to compress vs. what NOT to compress. PDFs and BMPs, for example, just don't like to be compressed. So, the compressible text documents we have tend to be smaller. This still might work.

    I just wish we did not have to code in a bunch of exceptions....

    And also did not have to code all bunch of loops there and here as a work-around.

    This should have been simplistic 1-liner INSERT.... SELECT....fn_compress(BLOB)....

    Geez.

    Anyway, THANKS, fellows. I am getting good practical ideas out of this discussion. Still might work without resorting to 3-party complications.

  • Our content was text and xml, so compression rates were very high.

    Good luck and keep us posted on your progress!

  • Make your compression function deterministic by decorating it with [Microsoft.SqlServer.Server.SqlFunction(ISDeterministic=true)] wrap the function in a T-SQL user define function and check the BLOB length with DATALENGTH() before calling the CRL compressor. If > 1 MB (or whatever the upper limit is) convert varbinary(MAX) and add a bitmap header indicating 0 (do not call the compression function), otherwise compress the BLOB add a bitmap header indicating compression.

    For decompression check the bitmap header and call the decompression or just convert to text...

    CREATE FUNCTION dbo.udf_compress(

    @blobNVARCHAR(MAX))

    RETURNS VARBINARY(MAX)

    AS

    BEGIN

    IF(DATALENGTH(@blob)> (2 * 1024 * 1024))

    RETURN CONVERT(VARBINARY(MAX), 0x0) + CONVERT(VARBINARY(MAX), @blob);

    RETURN CONVERT(VARBINARY(MAX), 0x1) + fn_Compress(@blob);

    END

    To determine if it was compress:

    SELECTCASE WHEN SUBSTRING(@compressed, 1, 1) > 0 THEN 'Decompress'

    ELSE 'Convert'

    END

    Hope this helps

  • OK, I found that compressing the records in a loop (I used a cursor) gets the job done with no "out of memory" issues. It is still quick enough and also restartable if needed (with little extra management coded in, that is). Worked for me.

  • hiii

    I have a problem, i have a server in sql2005 and send to next error.

    .NET Framework execution was aborted by escalation policy because of out of memory.

    In team events, indicated sql then this post was blocked

    Someone can back me up? because I believe the problem is not memory as such sql

  • Please provide SQL Server version and SP level for better support. I have seen that while working with VERY large BLOBs in pre SP1 servers (didn't try in SP2 and SP3). The problem is in you CLR integration sandbox while trying to allocate space for a VERY large array (i.e. Stream.Buffer). SQLCompress.NET solves that problem by streaming data out of the BLOB into another SQLBytes which uses tempdb instead of memory. If the problem persists and you really want compression on your database you could dowload SQLCompress.NET (which works with SQL 2008 as well) for that.

  • Thanks very much, a very useful article!

    There looks like some tiny issues with the code however.

    1. Buffer is an internal buffer which may or may not contain the whole set of bytes in the blob.

    byte[] blobData = blob.Buffer;

    i think should be this --

    byte[] blobData = blob.Value;

    Or alternatively you could read chunks from the buffer. loop writing them into the compressor

    2. The decompressedData needs to be flushed before closing.

    decompressedData.Flush()

    Without these changes my blobs had issues where the decompressed data was not identical to the original compressed data:

    CHECKSUM(myBlob)<>CHECKSUM(Decompress(Compress(myBlob))

    and

    myBlob <> Decompress(Compress(myBlob))

  • It would be nice to extend same function to proper zip / unzip archival. Wikipedia has published zip archive format specifications. Here is the link.

    http://en.wikipedia.org/wiki/ZIP_(file_format)

    I did so in our internal project, and don't feel sharing code, but given good description it is accessible practically to any developer.

    Note 16 byte data descriptor field, it will come handy if you don't feel like saving buffer(s) prior writing to stream.

    It became somewhat important to our team due to third party accepting compressed data. My feeling is sharing code ( to enable consistent decompression ) does not work always. However, writing to common interface documented as standard is nice. Having ability to drop the file to Windows Explorer, double - click on it and see content comes as a bonus 🙂

    Legal disclaimer:

    Please do not treat this entry in any shape or form other than personal note.

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply