It’s time again for T-SQL Tuesday and this month the topic is LOB, or BLOB data. You can read more about T-QSL Tuesday in the entry above, and check out tSQL2sDay.com for the redirect.
For you older DBAs, this refers to text, ntext, and image data. For the youngsters, you know there are newer, more complete data types in SQL Server 2005 and above that are the (n)varchar(max), the FILESTREAM types (SQL 2008), the XML, spatial (2008) and CLR data types. Basically the data types that hold data exceeding the size of a page (8kb) up to the maximum size of 2^32 bytes.
When I started working with SQL Server, it was with two third party products that my company had purchased. One was an imaging piece of software designed to store scanned documents and the other was a piece of fax software that received faxes electronically and stored them in SQL Server. One of these stored the actual image in the database as the image data type and the other stored the image in the file system with a “pointer” or path to the file in the database.
This used to be a constant debate, and indeed, even with the advent of the FILESTREAM data type, there still is a debate about which way to manage data such as image files, audio files, etc. There’s a great white paper from Microsoft Research you should read if you are considering storing LOB data in your database. It basically notes that if the files are 256kb or smaller, on overage, then keep them in the database. If they average greater than 1MB, use the file system and Filestream. In between, well, you might pick the one that works best for you.
Personally I like both methods. Keeping stuff in the database simplifies backup and restore procedures. I once had to restore a piece of softawre that used the store-paths-in-the-database and we had different paths on the DR server. That was not fun trying to sort out. You also have a clearer idea of how much space you are using with everything in the database.
However in the file system you have other advantages. You can have the web server just send images from its cache instead of requiring processing from the database server. I have always been concerned over load, so this appeals to me. It also lets someone that’s not a database person easily view or work with images in things like Paint if they need to.
If you have images, I’d use the MS Research guidelines, but my real recommendation is that you learn how to work with FILESTREAM, learn the basic API and code needed, and integrate it if you can. One big rason to consider FILESTREAM and storage outside of the database is that many of us look to put our data and log files on RAID 1 or RAID 10 storage arrays. Those are expensive, and much of this data could easily be served from RAID 5 volumes, at a much lower storage cost. When you have a lot of data, this can be a significant cost.
If you have large volume of binary data, like audio or video files, you might consider getting a dedicated appliance at some point as well. I’ve worked with one of these before FILESTREAM, and it worked pretty well. These are Remote Blob Storage (RBS) features built into Sharepoint 2010, and I am guessing that as we accumulate more of this data, and as Sharepoint is deployed more, it makes sense to use these features.
So what should you do? It depends on what will work best in your environment. Learn a bit more about Filestream and make the best decision that you can.