Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Using CLR integration to compress BLOBs/CLOBs in SQL Server 2005 Expand / Collapse
Author
Message
Posted Friday, February 02, 2007 6:41 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Friday, April 18, 2014 9:01 AM
Points: 6,705, Visits: 1,680

Im a little late reading this, but have a couple comments to add.

My first thought was to agree with Ian, that I think this would be better done in the middle tier. It would reduce the bandwidth used, reduce the db server load, and still give you the space savings. Since it would be almost the same chunk of code, it's a 'where should the code go' question. The only downside I see to middle tier placement is that if someone wants to do some type of bulk load or update, they'll have to be sure to use the same compression method/code. Actually, I think I still agree with Ian!

I don't see this as a bad use of the CLR. If you've decided to do it on the server, clearly CLR provides the best way to accomplish it.

I'm not sure I agree with Peter that it would kill server performance. Not sure what the tradeoff would be, would guess cpu and memory usage would go up, disk io would decrease as would space usage. If you needed to optimize the compression could be done asynchrously, with a flag on the table indicating compressed or not, job running behind to handle the compression. Not much you could do about uncompression though, unless you went with a split solution where you handled compression on the server, uncompression on a different tier. Might not be best fit for things like serving images, but it would probably do well for something like document storage.

I think it parallels how we do compression in the file system. Sometimes we just mark the folder as compressed, sometimes we compress the file manually. Which is better or more correct?

I also think that if SQL offered it as a native option you'd probably see a lot of people use it.



Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #342007
Posted Friday, February 02, 2007 8:51 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, January 30, 2014 10:08 PM
Points: 1,038, Visits: 444

The NTFS file system comparison is great!
Sometimes you compress by using file/folder attributes because you get seamless access to the file, it's convenient and you get some compression benefits for some acceptable overhead.

Likewise, sometimes explicit compression is better - you offload the compression overhead to another one-off task, but you lose the ease of access.

As always, it really depends.   I do agree with others though; it was a great article from the point of view of seeing just what sort of things you can do in the DB engine with the new CLR integration - good or bad - and is a welcome distraction from some of the more mundane DBA task articles




Post #342071
Posted Tuesday, February 06, 2007 5:38 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Wednesday, March 05, 2014 6:38 PM
Points: 501, Visits: 251

Nice artical...

Got a question...

So what is happening here is that you have created a .net application and then ran some code that generated an assembly that contains one type with type funcitons that compress and decompress binary data ; and is in some way registered with sql2005.

In order to utilize this functionality one would need to need to store something like an image into the database. So since .net data binary is wider ( NEED SOME help here ) the functions compress that binary data so that it can be stored into sql..

Looking for some straightening here... I have one foot in and one out.. HELP!

Erik..............



Dam again!
Post #342970
Posted Monday, March 05, 2007 9:55 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 8:47 AM
Points: 76, Visits: 756

Hi Yoel - I enjoyed your article on blob compression - but I believe there is a bug in the compression function...

I was starting to work on the same concept (as well as blob encryption) when I came across your article.  I started playing with your code using the sort of data that I have been intending on compressing.  I had noticed that for most of my data, I wasn't getting the kind of compression ratios that I was expecting, and in many cases the resulting compressed data was actually 2x larger.  This seemed to be the case for our smaller blobs (around 2K to 6K bytes) - which is the majority of the data that we are looking to compress (some rows contain much larger data, too).  I compared this with the same data, that I compress from a file using the same deflate method in a standalone .net program and got 80% compression or so.

I determined that the cause of the problem is that the blobData.Length (the length of the byte array) value is not reliable - it didn't actually match the length of the blob data.  It was often 6-8K when the actual data was much smaller, even zero length.  If I use blob.Length (the length of the SqlBytes parameter) when writing to the compression stream, I got the results I expected.

Just wanted to let you and the readers know what I found... please validate as you see fit

Oh, BTW - We are thinking about using blob compression to reduce our SAN chargeback costs (which are insanely high) - the blob data we're compressing needs to be online with the active data in our application,  but isn't updated after it's inserted, and isn't selected very often relative to the rest of the data; so the compute costs are manageble.

 

 

 

 

Post #349188
Posted Monday, March 05, 2007 10:19 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, January 21, 2014 11:24 AM
Points: 12, Visits: 76

JoeA:

Thanks for pointing this out. You are right, I actually responded yesterday to an issue someone had with this code in SQL Server SP1. He was setting the same varbinary(max) variable twice from larger to smaller and was getting garbage after reducing the size. I look into it and found a problem with .Buffer. That property was not refreshed by SQL every time and it kept all the garbage from previous calls. I don't know if Microsoft solved that issue in SQL Server SP2 BUT I changed the code to use the stream version of SQLBytes. By the way MS has resolved the size limitation in SP2.

There is a problem with the buffer property, I couldn't fix since it because it is "an undocumented feature". I looked into SQLCompress.NET behavior and it does not have this problem since they use streaming methods to manipulate the data. Which made me wonder: Have you tried that product? Whatever the case I love to see my code put to a good use and see people committed to try it out and improve it. Thanks again for your help and comments.

Post #349198
Posted Wednesday, March 07, 2007 4:02 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 8:47 AM
Points: 76, Visits: 756

I have not tried SQLCompress.NET.  We are less inclined to use smaller third party add ons.  Not that they're bad, but in the end, the corporate red-tape involved in getting them in house and supportable exceeds the cost of building some of those capabilities ourselves... not to mention the general hassle of dealing with "the man".

Thanks!

Post #349871
Posted Thursday, July 16, 2009 11:31 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, February 13, 2014 11:29 AM
Points: 23, Visits: 290
JoeA (3/5/2007)


Hi Yoel - I enjoyed your article on blob compression - but I believe there is a bug in the compression function...

.....I determined that the cause of the problem is that the blobData.Length (the length of the byte array) value is not reliable - it didn't actually match the length of the blob data. It was often 6-8K when the actual data was much smaller, even zero length. If I use blob.Length (the length of the SqlBytes parameter) when writing to the compression stream, I got the results I expected.

Just wanted to let you and the readers know what I found... please validate as you see fit

Oh, BTW - We are thinking about using blob compression to reduce our SAN chargeback costs (which are insanely high) - the blob data we're compressing needs to be online with the active data in our application, but isn't updated after it's inserted, and isn't selected very often relative to the rest of the data; so the compute costs are manageble.


MANY MANY THANKS, JoeA!!!

I was also hit by this exact bug.
Compress/Decompress simply corrupted the data... Decompression actually blew it even bigger.

So this one change fix my issue.

(SQL 2005 SP2 build 2005.090.3073.00, btw.)

Also THANKS to Yoel!!

This solution should actually work well for document archiving (my exact case). The BLOB data is still available to the users, but actual usage of it is low. Thusly, the performance is non-issue at all. But the space savings and availability is a huge benefit.
Post #754382
Posted Thursday, July 16, 2009 3:25 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, February 13, 2014 11:29 AM
Points: 23, Visits: 290
An hour later I have some bad news.

As soon as I attempted to put a bit of a real load on the SQLCLR compresson and push through some real data, things exploded:

Msg 6532, Level 16, State 49, Line 1
.NET Framework execution was aborted by escalation policy because of out of memory.
System.Threading.ThreadAbortException: Thread was being aborted.
System.Threading.ThreadAbortException:
at System.IO.MemoryStream.set_Capacity(Int32 value)
at System.IO.MemoryStream.EnsureCapacity(Int32 value)
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Compression.DeflateStream.InternalWrite(Byte[] array, Int32 offset, Int32 count, Boolean isAsync)
at System.IO.Compression.DeflateStream.Write(Byte[] array, Int32 offset, Int32 count)
at SqlClrTools.SqlCLRTools.fn_compress(SqlBytes blob)
The statement has been terminated.
Post #754513
Posted Thursday, July 16, 2009 3:38 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, February 13, 2014 11:29 AM
Points: 23, Visits: 290
A bit of a background about what I need to do:

1) need to compress some BLOBs that store binary copies of real documents (sample attached)

2) in this test run I tried to push through 56 BLOBs where the biggest is about 40MB:

--uncompressed BLOBs stored here:
create table _Files (FileName varchar(200), FileSize int, FileContent varbinary(max))

--load the data here.... (irrelevant)

--compressed BLOBs will be stored here:
create table _FilesCompressed (FileName varchar(200), FileSize int, FileContent varbinary(max))

--an attempt to compress 56 documents blows up.
insert Into _FilesCompressed (FileName, FileSize, FileContent)
select f.FileName,
null,
dbo.dba_clrCompress(f.FileContent)
from _Files f

I will try to compress data by several documents at the time or using a cursor alltoghether and see....

Any other ideas?

PS: The ".NET Framework execution was aborted by escalation policy because of out of memory. " seems to be way too common according to the Google... No good.


  Post Attachments 
DataSample.JPG (8 views, 123.26 KB)
Post #754520
Posted Thursday, July 16, 2009 6:15 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, January 21, 2014 11:24 AM
Points: 12, Visits: 76
Grasshopper,

There was a limitation on how much memory you can allocate in a single new byte[] call in SQL 2005 pre SP2. I don't know if that is the case after SP2 but was one problem with this solution (sorry but CLR integration usually died around 2 MB). Please let me know which CLR SP you are using and which SQL Server SP are you running. If you don't have the red tape issues with free software you could try SQLCompress.NET is free and tested with images around 120 MB. If you have any issues drop a line to support@sqlcompress.net... I think they are about to release the SQL 2005/2008 version under open source.
Post #754567
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse