Still Confused about the difference between varchar(max) and varchar(8000)

Question

Still Confused about the difference between varchar(max) and varchar(8000)

joshua 15769

Default port

Points: 1492
More actions
August 16, 2016 at 12:46 am

#400580

With what I know, varchar(max) could store up to 2GB of data while varchar(8000) would store 8MB of data, My question is, what is the difference if I put a 8MB data on a varchar(max) variable and a 8MB data on a varchar(8000) variable? Arent they the same? or is the query more fast if i put it that way? Kindly enlighten me guys, I want to learn more about SQL. Thank you!

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 1

The main difference is over the use of in-row storage. VARCHAR will always have it, VARCHAR(MAX) will bias towards it, but then move to LOB storage when it needs to exceed the 8k limit. By and large, this only directly affects one thing, creating an index on the column. You can't put a standard b-tree index on a MAX column. In general, if your data is always going to be less than the 8k limit, use the VARCHAR(N) field over VARCHAR(MAX). This is because you will see performance degradation caused by different choices made by the optimizer so that it is prepared to deal with LOB data retrieval (it won't know if you've exceeded the 8k page size and spilled into LOB or not, so it assumes you have).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

joshua 15769 Default port Points: 1492 More actions · Answer 2

If i use Primary keys for my query? am i properly indexing my query? sorry for these replies, I really don't know much about data degeneration and such...

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 3

joshua 15769 (8/16/2016)
With what I know, varchar(max) could store up to 2GB of data while varchar(8000) would store 8MB of data, My question is, what is the difference if I put a 8MB data on a varchar(max) variable and a 8MB data on a varchar(8000) variable? Arent they the same? or is the query more fast if i put it that way? Kindly enlighten me guys, I want to learn more about SQL. Thank you!

It's actually NOT 8MB. It's only 8K. And whether it's "In Row" or not, I've found that VARCHAR(8000) is almost always faster than VARCHAR(MAX). Just my opinion, though. I might have to fix that with a test in the near future. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

joshua 15769 Default port Points: 1492 More actions · Answer 4

Interesting! Thank you for the kind response! I'll use varchar(8000) from now on.

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 5

joshua 15769 (8/17/2016)
Interesting! Thank you for the kind response! I'll use varchar(8000) from now on.

Unless you actually need more the 8000 characters.

Jeff is right. VARCHAR(N) is faster than VARCHAR(MAX), but speed is not the only consideration (although, note my signature, it's an important one). If you have to store more data, you sacrifice speed. It's worth noting though, we're not talking microseconds versus hours. It completely depends on your query and how the VARCHAR column is being referenced within it, scans vs. seeks, small row sets vs. large ones. Always remember with SQL Server, there is no single "right" answer. It depends, and what it depends on, is the one answer you can count on applying. Except for multi-statement, table-valued, user-defined functions. Don't use those ever.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 6

joshua 15769 (8/16/2016)
If i use Primary keys for my query? am i properly indexing my query? sorry for these replies, I really don't know much about data degeneration and such...

Maybe. Is the primary key on the varchar(8000) column? Then probably not. That's a pretty wide key (in fact, exceeds the limit for keys).

Without a lot more knowledge, I can't give you short, assertive answers to a question like this. Here's what I will say.

Normalize your database (or give it a proper star schema if it's a reporting system). Pick appropriate primary key values. Design the database, in part, around clustered indexes (which may, or may not, be primary keys). Assume that every table, with exceptions (but exceptions must be exceptional), must have both a primary key and a clustered index, but that they may not be the same column(s). Make the clustered index the most common path to the data, meaning the most frequently filtered value (usually, but not always, the primary key). After that, add nonclustered indexes as needed (just be sure you really need them).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 7

joshua 15769 (8/17/2016)
Interesting! Thank you for the kind response! I'll use varchar(8000) from now on.

Just to emphasize what Grant stated... RIGHT SIZING of any and all columns is a best practice. Don't get into the habit of genericizing datatypes. For example, a lot of people accept the defaults of some awful things such as NUMERIC(18,0) for integers and NVARCHAR(256) or VARCHAR(256) for every column. Some folks even use VARCHAR(8000) or VARCHAR(MAX) to supposedly make their tables more "bullet proof" if the size of the data ever happens to increase. I have to tell you that things like an abbreviation for a U.S. State will never get larger than 2 characters, for example. There are also some huge indexing ramifications, as well. For example, until 2014 came out, you could not rebuild an index in an online fashion if it contained one of the MAX or blob datatypes (bloody damned XML columns! :crazy: )

And, no... right sizing is not considered to be the sin of "pre-optimization" (they've really screwed up what Knuth stated). Rather, it's considered not only as a "best practice", but is also viewed as a line of demarcation between good programmers and weak programmers.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

joshua 15769 Default port Points: 1492 More actions · Answer 8

My primary keys are int sirs, But my database columns are all varchar(MAX)

I would like to thank you both for the great answers, i'm thinking, could I mark as solution all of your replies? Thank you and best Regards!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

joshua 15769 (8/17/2016)
My primary keys are int sirs, But my database columns are all varchar(MAX)
I would like to thank you both for the great answers, i'm thinking, could I mark as solution all of your replies? Thank you and best Regards!

Can you tell us a little about your database? Is it something you are in the process of constructing or is it something you have inherited? What is the purpose? What types of data are you storing? There may be a business case for having all columns VARCHAR(MAX) but it's very unlikely.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

RonKyle SSC-Dedicated Points: 31537 More actions · Answer 10

Interesting! Thank you for the kind response! I'll use varchar(8000) from now on.

Adding to the voices urging you to ensure your number choice is correct for the data. If it's a state field or city field or customer name field, you don't need all those characters. And if your database ever needs to talk with another database, particularly one of another type, it will make the transfer more difficult. I have an ETL that pulls from a PostgreSQL database where some recent fields have this field setting, unnecessarily so. It makes the ETL more difficult because SQL samples the top 200 or so rows to try to determine what it is. This is a choice that developers doing database design make, but not a choice an experienced DBA would make.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 11

RonKyle (8/18/2016)
This is a choice that developers doing database design make...

THAT would be the most important problem to fix. Such people will recite Knuth's famous "Pre-optimization is the root of all evil" observation as the reason why they do such things not realizing that Knuth never intended for people to use it as an excuse for being lazy, violating every best practice there is, and doing it so wrong (my politically correct saying for "stupid" :-P). Using the correct datatype is NOT a form of pre-optimization especially when it comes to design.

This is also why I don't let "developers" do database design unless they've been trained to do it right. Even then, I do 100% peer reviews and use those reviews as another opportunity to train/mentor folks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RonKyle SSC-Dedicated Points: 31537 More actions · Answer 12

This is also why I don't let "developers" do database design unless they've been trained to do it right

Unfortunately not everyone has that luxury. It doesn't help that they think they know this. To a lot of people (even some DBAs) the design rules seem unintuitive and overly rigid. To your lazy comment, there are a lot of DBAs who are also lazy on these points, which doesn't help the cause.

Eric M Russell SSC Guru Points: 125522 More actions · Answer 13

My understanding is that, at least by default, when given a VARCHAR(MAX) column, SQL Server will only store the data offrow, if the total data size for a row doesn't fit inside 8k bytes. I also though that regardless of whether the data is stored in-row or off-row, SQL Server would maintain some type of off-row pointer, something like 16 bytes, for VARCHAR(MAX) columns.

In the simplified test below, I was expecting TableA (varchar(8000)) to require less reserved space as TableB (varchar(max)), because it wouldn't have to contain the VARCHAR(MAX) pointers on each row.

However, the sp_spaceused stats for both tables appear identical.

create table TableA

(

col1 int not null primary key,

col2 datetime not null,

col3 varchar(3) not null,

col4 int not null,

col5 varchar(8000) not null

);

insert into TableA ( col1, col2, col3, col4, col5 )

select id, crdate, type, parent_obj, name from master.sys.sysobjects;

exec sp_spaceused 'TableA';

namerowsreserveddataindex_sizeunused

TableA2121 144 KB120 KB24 KB0 KB

create table TableB

(

col1 int not null primary key,

col2 datetime not null,

col3 varchar(3) not null,

col4 int not null,

col5 varchar(max) not null

);

insert into TableB( col1, col2, col3, col4, col5 )

select id, crdate, type, parent_obj, name from master.sys.sysobjects;

exec sp_spaceused 'TableB';

namerowsreserveddataindex_sizeunused

TableB2121 144 KB120 KB24 KB0 KB

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 14

RonKyle (8/18/2016)
This is also why I don't let "developers" do database design unless they've been trained to do it right
Unfortunately not everyone has that luxury. It doesn't help that they think they know this. To a lot of people (even some DBAs) the design rules seem unintuitive and overly rigid. To your lazy comment, there are a lot of DBAs who are also lazy on these points, which doesn't help the cause.

Understood and sorry to have come across so harsh on this but it's a serious problem that's easily preventable by both developers and DBAs alike. It doesn't help that a lot of front-end design tools default to NUMERIC(18,0) for anything that looks like an Integer and NVARCHAR(256) for everything that looks like a string. It also doesn't help that a lot of "managers" don't understand that if they want something real bad, that's the way they'll usually get it. Then everyone gets together and yells at the poor slob (DBA or not) that's responsible for the database servers because "the servers are slow" and "SQL sucks".

The only way out of it is for someone to realize there's a vicious circle happening and then take it upon themselves to try to break that circle. Unfortunately, such people are frequently burned at the stake because "we've always done it this way" and "we've never done it like that before" or "that would slow us down". In such a shop, most people don't even try to make the necessary change because they "don't want to make waves" or "be the one to stick out" and the problem continues.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)