SQLServerCentral Article

Exploring the Varbinary Type

,

Introduction

It occurred to me recently that I had never actually used a binary or varbinary field in SQL Server for anything.  I suspect that part of the reason for this is that there are few occasions where using a varbinary field is the best approach.  In most cases, I think binary data is best stored outside of the database, with the database holding a simple file location.  Of course, in SQL Server 2008 and later it is possible to use the FileStream option to store varbinary(max) data in the filesystem.

Since I have never really used it and I want to have a thorough understanding of SQL Server, I decided to create some scripts to work with it and just see how varbinary data is used.  Varbinary data needs to handled differently than some other types of data to use if effectively.  Most other data types used by SQL Server, like int or varchar, are human readable, where varbinary often needs to go through some form of conversion to be used.

In this article I take a very brief look at the ways that varbinary can be used, and I include a couple of examples to show how the data can inserted, read, and converted into a usable format.  This article is geared towards beginners, or at least people who have limited experience with varbinary even if they have a lot of experience with SQL.  The examples in Section III are mostly independent of each other and you can easily look at just the one for the language you prefer.  The examples are also somewhat artificial since they were entirely made up to show a fairly simple usage.

Some Background

The varbinary(max) data type can store up to 2GB of arbitrary binary data.  Since it can hold arbitrary binary data, it can be extremely versatile with the correct conversion routines.  One common use is to replace the now deprecated Image datatype.  It can also store complex datatypes that might come up in object oriented programming, in a relational database in a simple and straightforward way.  Similarly, it can directly hold the contents of virtually any file.  It also plays a significant role if you want to encrypt the data in a column in SQL Server, as K. Brian Kelley explained in his tip.

However, as seen in the examples below, most data needs to go through some conversion before it can inserted into a varbinary column or before it can be used after retrieval from varbinary.  SQL Server will implicitly convert from some datatypes such as int and bigint when inserting the data into a varbinary column, but varchar data needs to be explicitly converted and depending on the libraries being used applications which are trying to work with a varbinary data column may need to do additional conversions.

A Few Quick Examples

Before moving on to the actual examples, we will need a testable.  Here we will use a very simple version:

create table blobTest (
            id int identity primary key,
            dataDesc varchar(200),
            binData varbinary(max)
            )

All of the following examples use this table.  They were all designed to be fairly simple, but meant to show ways to get different types of data into and out of a varbinary column in SQL.

A C# Byte Array (VS 2008)

Since C# is one of the more common languages used in conjunction with SQL Server, it seemed like a natural place to start.  This example is almost trivially simple in that it starts with just string data and inserts it into a varbinary column.  But it does give a basic example how to encode data into a byte array which can then be inserted into a varbinary column and how to decode it afterwards.  Anything which can be encoded into a byte array can be stored in a varbinary column, including data which has gone through client side encryption or complex objects which can be serialized to a byte array.

using System;
using System.Text;
using System.Data.Sql;
using System.Data.SqlClient;
namespace Csharp_blob_test
{
    class Program
    {
        static void Main(string[] args)
        {
            using (SqlConnection conn = new SqlConnection("YourConnectionString"))
            {
                conn.Open();
                byte[] testArray = Encoding.ASCII.GetBytes("We will store this string as varbinary");
                string sqlCom = "insert into blobTest (dataDesc, binData) values (@dataDesc, @binData)";
                SqlCommand cmd = new SqlCommand(sqlCom, conn);
                cmd.Parameters.Add("@dataDesc", System.Data.SqlDbType.VarChar);
                cmd.Parameters["@dataDesc"].Value = "Array From VS2008";
                cmd.Parameters.Add("@binData", System.Data.SqlDbType.VarBinary);
                cmd.Parameters["@binData"].Value = testArray;
                cmd.ExecuteNonQuery();
                //Now read it back in to verify
                sqlCom = "select binData from blobTest where dataDesc = 'Array From VS2008'";
                cmd = new SqlCommand(sqlCom, conn);
                SqlDataReader reader = cmd.ExecuteReader();
                while (reader.Read())
                {
                    //This one will show System.Byte[]
                    Console.WriteLine("Raw Value = {0}", reader[0]);
                    //Explicitly case to byte[] and convert back to a string.
                    Console.WriteLine("Converted Value = {0}", ASCIIEncoding.ASCII.GetString((byte[])reader[0]));
                }
                reader.Close();
            }
        }
    }
}

?A user supplied file from PowerShell

PowerShell is another tool that makes interacting with SQL server relatively straightforward.  It has access to .Net libraries including SMO and makes certain types of scripting simple and straightforward.  Loading data through PowerShell is similar to doing it with C# or other .Net languages, and this example shows how to load an arbitrary file from the filesystem.  With a little modification it could load a directory of files with appropriate metadata into SQL Server.  For instance, with a little tweaking, something like this could load numerous images for supply to other programs.

#Connect to the database
$connStr = "YourConnStr"
$conn = new-object System.Data.SqlClient.SqlConnection($connStr)
$conn.Open()
#Load a file
$filename = Read-Host 'What file should be loaded?'
$desc = Read-Host 'What description should it have?'
[Byte[]]$binData = get-content -Encoding Byte $filename
#prepare
$cmd = new-object System.Data.SqlClient.SqlCommand("Insert into blobTest (dataDesc, binData) values(@desc, @bin)", $conn)
$cmd.CommandType = [System.Data.CommandType]'Text'
$cmd.Parameters.Add("@desc", [System.Data.SqlDbType]'VarChar')
$cmd.Parameters["@desc"].Value = $desc
$cmd.Parameters.Add("@bin", [System.Data.SqlDbType]'VarBinary')
$cmd.Parameters["@bin"].Value = $binData
#execute
$cmd.ExecuteNonQuery()
$conn.Close()

Michael Otey demonstrated a somewhat similar technique using openrowset to insert data from files into a varbinary column in an older column.

Pickle and arbitrary objects in Python 2.7

Python is a language I use frequently with SQL Server and can be used in an object oriented fashion.  It can be useful to be able to store complete objects with minimal translation in the database where they can be quickly retrieved in an organized fashion.  With help from the Pickle library, we can store a wide variety of complete Python objects in the database.  This particular example simply uses a Python list with a small variety of data types in it to show that a relatively complex object can be stored simply.  However, this technique would work equally well for any object which was compatible with pickle, including complex user defined types. 

Generally, I personally think it is better to break down an object into component parts and store those parts in SQL Server, rather than trying to store complex objects in a varbinary format.  By breaking it down into native SQL types where possible, it makes it easier to examine those types and perhaps run analysis on them later.  It can, sometimes, also make it easier to bring parts of that data into a different program without bringing the entire object in.  Object-relational mapping tools like SQLAlchemy can help with the conversion between program objects and native datatypes in SQL.  However, there are certainly times when it makes sense to quickly place complex objects into a table with only minimal conversion.

import pyodbc, pickle
#Create the SQL connection
sqlConnStr = ('DRIVER={SQL Server};Server=(local);Database=TestDb;'+
           'Trusted_Connection=YES')
sqlConn = pyodbc.connect(sqlConnStr)
#Create a list with int, float, complex, string, T/F data, and a tuple
testList = [1, 2.1, 3+4j, 'a', True, (5, 6)]
llPickle = pickle.dumps(testList) #Then pickle it
#Store the list in the database
curs = sqlConn.cursor()
sql = """
    insert into blobTest (dataDesc, binData)
    values (?, ?)"""
curs.execute(sql, ('TestList', pyodbc.Binary(llPickle)))
sqlConn.commit()
#retrieve the Data
sql = """
    select top 1 --top 1 in case this script gets run repeatedly
        dataDesc, binData
    from
        blobTest
    where
        dataDesc = ?
        """
curs.execute(sql, ('TestList', ))
values = curs.fetchall()
newLL = pickle.loads(values[0][1])
print newLL

One significant thing to note is the call to pyodbc.Binary rather than just providing the variable as a parameter to the query.  Omitting that step will result in an implicit conversion error.  The call to pyodbc.Binary could be replaced with a call to the built in bytearray though.  The pickling process is also necessary.  Trying to pass the testList directly as a parameter to the query results in an “Invalid parameter type” error and trying to pass it to bytearray gives a TypeError.

An Image in Python 2.7

While the Pickle Library can make it easy to store many complex objects, some types of objects cannot be immediately pickled.; So, other approaches are needed to put those into a varbinary field in SQL.

To take a straightforward example, here is a short Python script that makes a simple image file, in this case a straightforward blue square.;

Blue Square

The script then stores it in the table, retrieves it and displays it using pybodbc  to interface with SQL Server, PIL to create a simple image, and StringIO to provide a buffer to hold the bmp version.  It looks like:[1]

from PIL import Image #create the image
from StringIO import StringIO #stores the bmp for the image temporarily
import pyodbc
#Create the SQL connection
sqlConnStr = ('DRIVER={SQL Server};Server=(local);Database=TestDb;'+
            'Trusted_Connection=YES')
sqlConn = pyodbc.connect(sqlConnStr)
#Create a image to store and prepare it for storage
img = Image.new('RGB', #mode
                (320, 240), #size
                (0, 0, 255) #color, this is dark blue
                )
imgBuffer = StringIO()
img.save(imgBuffer, "BMP")
#Store the image in the database
curs = sqlConn.cursor()
sql = """
    insert into blobTest (dataDesc, imageData)
    values (?, ?)"""
curs.execute(sql, ('Blue Image', pyodbc.Binary(imgBuffer.getvalue())))
sqlConn.commit()
#release the memory
img = None
imgBuffer.close() #release the memory
#retrieve the image
sql = """
    select top 1 --top 1 in case this script gets run repeatedly
        dataDesc,
        imageData
    from
        blobTest
    where
        dataDesc = ?
        """
curs.execute(sql, ('Blue Image', ))
values = curs.fetchall()
#Bring it back as an image
imgBuffer2 = StringIO(values[0][1])
img2 = Image.open(imgBuffer2)
#Image.show is meant for debugging and testing.
#It may generate messages about 'sleep' not being recognized
#but it works nicely for a test script like this one.
img2.show()

Essentially, since a PIL image cannot be pickled directly, we are reducing it to a JPG which we are placing in the buffer provided by StringIO.  While this is one convenient option, it is hardly the only one.  Instead we could decompose it into tuple of the images mode, size, and data using the appropriate properties and that tuple would be fully compatible with Pickle.  To return it to an image, we would use the Image.fromstring method with that data.

Just like the last example, this script uses an explicit conversion with pyodbc.Binary to convert it to a bytearray before it is passed to SQL Server.  Omitting that will generate an “Operand type clash”.  Alternatively, in Python 2.6 or later a direct call to the bytearray conversion also works well.

Trying to use pyodbc.Binary (or bytearray) directly on the PIL Image generates a TypeError, which is why this script uses the StringIO[2] library to reduce the PIL Image to a buffer which is then handed off.

Conclusions

Generally, I think it will be better to store binary data, especially large binary data, outside of the database and instead include a pointer within the database.  But it is certainly good to know how to store things like images or complete object-oriented entities in the database in a binary fashion for the cases where it is beneficial.  When it is beneficial, Varbinary(max) provides a straightforward and flexible system for storing arbitrary binary objects inside the database, which permits it to store complex objects to be handed to an object oriented program, graphics to be displayed on a website, or sound files to be consumed by a player.  Although varbinary almost always requires some sort of conversion to be useful and thus can require a bit more work than other datatypes, there are certainly cases where that can be well worth it.

Further Reading



[1] When creating the connection string for pyodbc, pay attention to the DRIVER portion.  In some cases, using the wrong driver may work for many transactions but introduce errors on other transactions.  For instance, using the older {SQL Server} driver instead of {SQL Server Native Client 10.0} with later versions of SQL Server like SQL Server 2008 R2 can result in a “[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The text, ntext, or image pointer value conflicts with the column name specified. (7125)” Error.

[2] cStringIO provides a similar interface that is generally faster

Rate

4.45 (11)

You rated this post out of 5. Change rating

Share

Share

Rate

4.45 (11)

You rated this post out of 5. Change rating