Blog Post

Actual Emails: TL;DR: Stop using varchar(max)

,

Wrote this email recently to a crew of developers who were shooting themselves in the foot with a database rich in varchar(max) data types.

Hey folks-

TL;DR: Stop using varchar(max). We’re not storing

books.

We need to review and avoid the varchar(max) data type in our tables. Here’s a short treatise as to

why.

In SQL Server, varchar(max)

is intended to replace the old text data

type, which was different from varchar(n)

because it was designed to store massive amounts of data.  Massive being greater than 8000 characters,

and all the way up to 2gb worth of data. That’s what the varchar(max), varbinary(max), and nvarchar(max) data types are optimized for – HUGE blocks of data

in a single cell. We should only use it if we're actually intending to store massive text and use the fulltext indexing engine (a completely separate and specific topic for text blocks).

This is an oversimplification, but varchar(max) is designed to store data differently, and specially for

large text blocks. It appears to behave the same as a varchar(n) field, and that’s deceptive when we are throwing 100-200

characters in each row field.

The big drawbacks biting us right now about varchar(max) have to do with indexing,

and this is regardless of how much data

is actually in a varchar(max) field. A varchar(max)

column can’t be the key of a nonclustered index, even if it never stores

more than 8000 characters, and can’t have ONLINE index maintenance performed.  As a result, it is generally a giant pain for indexing, a pain you only want to put up with if you absolutely have to.

Furthermore, we’re doing ourselves a disservice for

performance, straight up. Unless you’re storing books, (max) hurts performance.

Check out this blog post: http://rusanu.com/2010/03/22/performance-comparison-of-varcharmax-vs-varcharn/

 

In short, varchar(max)

is burdensome overkill for our datasets.

So here’s the solution… Change varchar(max)

to varchar(n), where n is an generous

but appropriate number for that column’s data. If Excel creates varchar(max) columns for us when performing a data import wizard, change them

to varchar(8000), which is the highest

number you can assign to a varchar

field.  Or better yet, once the data is

in SQL, use this simple syntax to find out the max length of a column and then

pad it.

For

example: select MAX(LEN([yourcolumn])) from yourtable 

Problem is, our SSIS packages are all very picky about the

data types and will break if we just change the data types. So, after making

these table changes, you’ll need to open your SSIS package, open the data flow destination

or other object, hit OK to apply the new metadata, save and deploy it again. No

actual changes necessary.

This all came up because we have data quality issues with

the fields Foo and Bar. Both of those columns are

varchar(max). I’m dumping the varchar(max)

data into temp tables with varchar(200)

to get the queries to return in a reasonable amount of time. 

Let me know if you have any questions!

 

William 

I like to use the word treatise to prepare my audience for verbosity.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating