Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Revenge of the BLOB

By Andrew Sears,

Binary objects and unstructured data

In the past, my opinion was that a BLOB or Binary Large Object really had no place in a relational database. My experiences with these image & binary data types are that they are slow performing, and a pain to work with in T-SQL. Any operations working with these BLOBs were usually performed in a separate business layer outside of SQL. This suggested to me it made more sense to put things like files or images where they were designed to be placed - in the file system.

BLOBs tax relational database performance, and though there are many ways to mitigate this I still believe it is better to have a pointer to the data rather than storing the data itself. However, sometimes you cannot get away from using a BLOB, or inheriting it from another architecture. The interesting thing about BLOBs is that they are slowly taking over the database world.The offset of this is that more tools are being provided to allow easier manipulation of this data.

This article will describe some of the new features of SQL Server 2005 and SQL Server 2008, and provide an example of pulling a BLOB out of an image column in SQL Server 2005.

Bacronyms and marketing speak

In my last article, Why SQL Server is a 4-letter word, I discussed acronyms. The term blob is a bacronym. This is an acronym that was constructed after-the-fact to cover up that it really was named after "The Blob that ate Cincinnati." The person who first applied the name Blob to a database object was Jim Starkey. He tells a hilarious Dilbert-ish story about the origins of the term here, and how marketers like acronyms.

Acronyms are professional and great for marketing buzz, yet they feel a bit impersonal to me. Would you rather read about the rich-media framework from Microsoft called WPF/E or a cool new technology called Microsoft Silverlight? Which letters makes you feel warmer and fuzzier when they are put together,  I-B-M or A-p-p-l-e?

That's a lot of rows

A problem I had recently was with a 3rd-party financial system. The vendor provides a proprietary database with a table containing over 300,000 records, 2.8GB of data, and one tiny clustered index on a unique identifier column. Of the 16 columns, it is a column of IMAGE type which stores the most data. Digging further into the column, it is storing a binary representation of 5000 4-byte float or money data type values. New tables are created after each execution of the tool with a unique identifier appended to the table name. The customer would like have all of the data available for querying at any time after the table is loaded. Since this is 1,665,000,000 unfiltered rows per day there could be a problem with scalability and performance if the data is staged.

Not a lot for a cube

Developing an Analysis Services 2005 cube to aggregate the data would be the recommended approach in this situation, since it could handle this scale of data and the combinations of queries given to it.However, the customer is most comfortable querying subsets of the data in SQL Server, and the timeframe does not allow for an OLAP-based approach.The interim solution is to build a parameterized stored procedure and table function in SQL Server 2005 to de-BLOB the BLOB and return a subset of the rows.

The T-SQL approach

In SQL Server 2000, there is an SDK utility called TextCopy that can manipulate a BLOB within an image column. However this is outside the SQL Server environment. In SQL Server 2005 you can use the Import Column data flow task in Integration Services. Again, this is outside of the Query Analyzer window.  Within SQL Server, you can use the new VARBINARY(MAX) data type to convert the IMAGE data type to something workable.

Size up the Data

The first item of business is determining the size of the data column.  Since I know based on legacy code that the byte size is 4, I would like to see the number of records stored in each column.  The function to use here is DATALENGTH.

SELECT convert(varbinary(MAX),value) as ColumnValue
 ,DATALENGTH(value) as Length
 , DATALENGTH(value) /4as 
RecordCount
FROM [BigBlobTable]

Dividing the number by 4 gives us 5000 records in the column.

The Deblobber Table Function

I am still looking for a good way to remove the WHILE function, and the Table Function too, however this does the trick for now. 

The Deblobber Code

CREATE FUNCTION [dbo].[Deblobber] ( @blob AS VARBINARY(MAX) )
RETURNS @Values TABLE ( Value MONEY )
AS 
BEGIN
DECLARE @length AS INT
 
DECLARE @i AS INT
SET @i = 1
SET @length = DATALENGTH(@blob)
WHILE @i < @length
BEGIN
INSERT  INTO @Values
SELECT  CAST(SUBSTRING(@blob, @i, 4) AS MONEY)
SET @i = @i + 4
END
 
 RETURN 
END

The Deblobber table function is used in the stored procedure below. Names have been changed to protect the innocent. In no way should these names be used in a professional environment.

The Select Statement Code

CREATE PROCEDURE [dbo].[GetBigBlob] (@TableID AS VARCHAR(MAX) = '17627', @PickListID 
AS VARCHAR(MAX) = '12021855,12233944')

AS

-- Create a link to the table BigBlobTable_17627
IF EXISTS ( SELECT  name
FROM    sys.synonyms
WHERE   Name = N'BigBlob' ) 
 DROP SYNONYM N'BigBlob'

EXEC ( '
create synonym N'BigBlob for N'BigBlobTable_' + @TableID + ']' )


-- Turn off row counts for performance
SET nocount ON;

-- Define a Common Table Expression (CTE)
WITH BigBlobCTE as ( 

-- Define an auto-incrementing row identifier
SELECT    ROW_NUMBER() OVER ( ORDER BY [UniqueID] ASC ) AS 
CounterID,
                    
CONVERT(VARBINARY(MAX), value) AS ColumnValue           
FROM      BigBlob

-- Use COALESCE function to ignore NULL parameter values

-- Use Split function to parse comma-delimited list of parameters
          
WHERE     (CHARINDEX(',',@PickListID ) = 0 AND COALESCE(@PickListID 
, PickListID ) = UniqueID)  OR (CHARINDEX(',',@PickListID ) > 0 AND 
UniqueID IN (SELECT value FROM dbo.fnSplit(@PickListID,',')))

)

-- Return values from the CTE

SELECT  resultValues.rowid,
      
resultValues.PnLValue,

     BigBlobCTE.CounterID         [positionUID],  FROM   BigBlobCTE -- JOIN to the User-Defined Table Function CROSS APPLY dbo.Deblobber(BigBlobCTE.ColumnValue) AS resultValues

The Results

The procedure above should, for each blob, return a table, union these tables together, and return them as a result set with the original data row.

The key SQL Server 2005 features demonstrated in this stored procedure are:

  • Synonyms – Friendly aliases for objects
  • Common Table Expressions (CTEs) – WITH syntax presents a table expression above the SQL SELECT statement rather than nested below in a derived table. Used for easier readability and reuse of the expression.
  • User-Defined Table (UDT) function Joins - CROSS APPLY syntax allows UDTs to be joined to other tables.
  • Dynamic Row Numbering - ROW_NUMBER() syntax provides a dynamic row count over a given sort order.
  • Large Binary data type - VARBINARY(MAX) removes 8000 character limitation of SQL 2000 and allows for manipulating Large Binary Objects (LOBs) within SELECT statements.

Problems

There are a few potential issues with this solution.

  •  Concurrency issues with the synonym could allow this procedure to run only once at a time
  • Pulling the data for a large number of records still poses a performance issue

 

Dynamic SQL could solve the concurrency issue, however it was determined that it will be run by a single user on an ad-hoc basis so it is not a priority.

 

Performance issues should be mitigated as there is no longer a need to pull all data.Instead, data can be joined and filtered.

 

Looking forward to SQL Server 2008

Microsoft is offering some potentially great enhancements to working with BLOB data in SQL Server 2008, including the following:

  • Integrated Full Text Search - no more service, it's built-in.
  • Sparse Columns - providing the ability to create millions of BLOB columns.
  • New Index Types - Increasing the performance of querying.
  • Data compression - Shrinking the size and providing various storage alternatives.
  • And last but not least, a filestream datatype in SQL 2008.

For more information on these new enhancements, take a look at the white paper released in August, 2007, entitled Managing Unstructured Data with SQL Server 2008. 

Could it be that the SQL Storage Team borrowed some unstructured data ideas from someone on the OneNote Team?

Author's Profile

Andrew Sears is a Senior Solutions Consultant and Microsoft Certified Technology Specialist: Business Intelligence Developer (MCITBID) at T4G Limited, with over 10 years experience in the Analytics group implementing Reporting, Data Warehousing and Business Intelligence Solutions. T4G Limited is a leading full-service, project-based technology services company and a member of the Microsoft Gold Certified Partner Excellence Program.

Andrew currently maintains various blogs on topics including Silverlight,PerformancePoint, and SQL Server Business Intelligence Solutions.

References

Total article views: 7767 | Views in the last 30 days: 130
 
Related Articles
FORUM

Function in Joining column degrades performance in 2008

When a function is called in the column used for the join degrades the performance of the query in 2...

FORUM

Function with Select * that references a View

Select * in the Function does not reference new columns added to a View

ARTICLE

Performance Test of New Date Manipulation Functions (SQL Spackle)

Performance test of new SQL 2012 functions DateTimeFromParts and Format

FORUM

Performance between union of table valued functions

Performance between union of table valued functions

ARTICLE

The Cost of Function Use In A Where Clause

Discusses Index Selection impact when functions are wrapped around WHERE clause filtering columns

Tags
blob manipulation    
sql server 2005    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones