SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Using CLR integration to compress BLOBs/CLOBs in SQL Server 2005


Using CLR integration to compress BLOBs/CLOBs in SQL Server 2005

Author
Message
voron999
voron999
SSC-Enthusiastic
SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)

Group: General Forum Members
Points: 164 Visits: 359
I am on SP2.
Exact build number posted above.

Really wanted to avoid 3rd party. But if they post the code, it could be useful.
Shame that M$ could not just do it right to begin with. This kinds of use cases were easy to predict.

Thanks.
JoeA
JoeA
SSC-Enthusiastic
SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)

Group: General Forum Members
Points: 168 Visits: 873
We implemented this technique successfully, but our Blobs were typically in the 10-50K range - never more that 5MB, and most access was via single row (2 or 3 blobs per row). We did test this across several million rows, including multi-row statements, and never saw any issues though our testing was limited to the blob sizes I mention earlier.
voron999
voron999
SSC-Enthusiastic
SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)

Group: General Forum Members
Points: 164 Visits: 359
To be fair, we also can limit size of BLOBs to push through compression.

Average size of the document in our business case is about 100-300K. Those make up 80-90% of the data to handle. There is also a question of what to compress vs. what NOT to compress. PDFs and BMPs, for example, just don't like to be compressed. So, the compressible text documents we have tend to be smaller. This still might work.

I just wish we did not have to code in a bunch of exceptions....
And also did not have to code all bunch of loops there and here as a work-around.

This should have been simplistic 1-liner INSERT.... SELECT....fn_compress(BLOB)....
Geez.

Anyway, THANKS, fellows. I am getting good practical ideas out of this discussion. Still might work without resorting to 3-party complications.
JoeA
JoeA
SSC-Enthusiastic
SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)SSC-Enthusiastic (168 reputation)

Group: General Forum Members
Points: 168 Visits: 873
Our content was text and xml, so compression rates were very high.
Good luck and keep us posted on your progress!
yolousa
yolousa
Valued Member
Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)

Group: General Forum Members
Points: 70 Visits: 89
Make your compression function deterministic by decorating it with [Microsoft.SqlServer.Server.SqlFunction(ISDeterministic=true)] wrap the function in a T-SQL user define function and check the BLOB length with DATALENGTH() before calling the CRL compressor. If > 1 MB (or whatever the upper limit is) convert varbinary(MAX) and add a bitmap header indicating 0 (do not call the compression function), otherwise compress the BLOB add a bitmap header indicating compression.
For decompression check the bitmap header and call the decompression or just convert to text...
CREATE FUNCTION dbo.udf_compress(
@blob NVARCHAR(MAX))
RETURNS VARBINARY(MAX)
AS
BEGIN
IF(DATALENGTH(@blob)> (2 * 1024 * 1024))
RETURN CONVERT(VARBINARY(MAX), 0x0) + CONVERT(VARBINARY(MAX), @blob);

RETURN CONVERT(VARBINARY(MAX), 0x1) + fn_Compress(@blob);
END

To determine if it was compress:

SELECT CASE WHEN SUBSTRING(@compressed, 1, 1) > 0 THEN 'Decompress'
ELSE 'Convert'
END

Hope this helps
voron999
voron999
SSC-Enthusiastic
SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)SSC-Enthusiastic (164 reputation)

Group: General Forum Members
Points: 164 Visits: 359
OK, I found that compressing the records in a loop (I used a cursor) gets the job done with no "out of memory" issues. It is still quick enough and also restartable if needed (with little extra management coded in, that is). Worked for me.
ovelasco
ovelasco
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 39
hiii

I have a problem, i have a server in sql2005 and send to next error.

.NET Framework execution was aborted by escalation policy because of out of memory.

In team events, indicated sql then this post was blocked


Someone can back me up? because I believe the problem is not memory as such sql
yolousa
yolousa
Valued Member
Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)Valued Member (70 reputation)

Group: General Forum Members
Points: 70 Visits: 89
Please provide SQL Server version and SP level for better support. I have seen that while working with VERY large BLOBs in pre SP1 servers (didn't try in SP2 and SP3). The problem is in you CLR integration sandbox while trying to allocate space for a VERY large array (i.e. Stream.Buffer). SQLCompress.NET solves that problem by streaming data out of the BLOB into another SQLBytes which uses tempdb instead of memory. If the problem persists and you really want compression on your database you could dowload SQLCompress.NET (which works with SQL 2008 as well) for that.
ourcookiemonster
ourcookiemonster
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 25
Thanks very much, a very useful article!

There looks like some tiny issues with the code however.


1. Buffer is an internal buffer which may or may not contain the whole set of bytes in the blob.

byte[] blobData = blob.Buffer;

i think should be this --

byte[] blobData = blob.Value;

Or alternatively you could read chunks from the buffer. loop writing them into the compressor


2. The decompressedData needs to be flushed before closing.

decompressedData.Flush()



Without these changes my blobs had issues where the decompressed data was not identical to the original compressed data:

CHECKSUM(myBlob)<>CHECKSUM(Decompress(Compress(myBlob))
and
myBlob <> Decompress(Compress(myBlob))
Miron Berlin
Miron Berlin
SSC Veteran
SSC Veteran (238 reputation)SSC Veteran (238 reputation)SSC Veteran (238 reputation)SSC Veteran (238 reputation)SSC Veteran (238 reputation)SSC Veteran (238 reputation)SSC Veteran (238 reputation)SSC Veteran (238 reputation)

Group: General Forum Members
Points: 238 Visits: 210
It would be nice to extend same function to proper zip / unzip archival. Wikipedia has published zip archive format specifications. Here is the link.

http://en.wikipedia.org/wiki/ZIP_(file_format)

I did so in our internal project, and don't feel sharing code, but given good description it is accessible practically to any developer.
Note 16 byte data descriptor field, it will come handy if you don't feel like saving buffer(s) prior writing to stream.

It became somewhat important to our team due to third party accepting compressed data. My feeling is sharing code ( to enable consistent decompression ) does not work always. However, writing to common interface documented as standard is nice. Having ability to drop the file to Windows Explorer, double - click on it and see content comes as a bonus Smile


Legal disclaimer:

Please do not treat this entry in any shape or form other than personal note.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search