Varchar or Char?

Question

Varchar or Char?

john.arnott

SSChampion

Points: 11882
More actions
October 15, 2009 at 10:40 pm

#205646

Comments posted to this topic are about the item Varchar or Char?

Viewing 15 posts - 1 through 15 (of 52 total)

You must be logged in to reply to this topic. Login to reply

vk-kirov SSCertifiable Points: 7686 More actions · Answer 1

As the column will hold only English language letters and digits, we don't need the additional bytes required by the national character data types

Not quite correct. You don't need to use Unicode data types for non-English letters. For example, if the identifiers contain English letters, Russian letters, and digits, you can use Cyrillic collation:

CREATE TABLE #tmp (id CHAR(10) COLLATE Cyrillic_General_CS_AS)

INSERT #tmp VALUES ('??????1') -- Russian word for 'hello'

INSERT #tmp VALUES ('Hello2')

SELECT * FROM #tmp

The result of this query is in the attached picture.

If you need to operate with more than one national alphabet, using of nchar/nvarchar is the only option. Otherwise (only English alphabet, English alphabet + one national alphabet), any of char/varchar/nchar/nvarchar data types can be used.

diamondgm SSCrazy Points: 2824 More actions · Answer 2

Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).

Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.

I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.

Nice question 🙂

ChiragNS One Orange Chip Points: 26137 More actions · Answer 3

Nice question, but i agree with diamondgm. The average calculation need not be the same as you pointed out.

"Keep Trying"

PostXript Right there with Babe Points: 719 More actions · Answer 4

The moment I checked the answer I knew there had to be a catch... While I don't agree on the assumption about distribution of the average I still like the question. More of these please!

SQLWinther SSCertifiable Points: 6294 More actions · Answer 5

diamondgm (10/16/2009)
Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).
Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.
I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.
Nice question 🙂

I agree with the author of the question when it comes to storage size (and IO performance), but there are more things to think of than storage, like you the trailing spaces that has to be stripped.

I got the question correct because I thought it had a catch, but personally, I would only use char in rare cases unless all the data for the column are equal.

/Håkan Winther
MCITP:Database Developer 2008
MCTS: SQL Server 2008, Implementation and Maintenance
MCSE: Data Platform

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 6

This was a great question. Well done, John! I hope you have more of these in store for us.

diamondgm (10/16/2009)
Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).

That's not an assumption, the even spread is explicitly mentioned in the question. Without it, I agree that the question would have been harder (if not impossible) to answer.

Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.

I doubt it. Almost all business applications I have seen have their performance tied to an I/O bottleneck, and have the CPU twiddling it's virtual thumbs while waiting for more data to be read or written. Those few cycles the CPU spends to trim trailing spaces won't affect performance at all, it just means one less thumb to twiddle.

Oh, and if you really want to compare CPU times, then don't forget that SQL Server has to do extra work for each varying length column (like getting the length, then the data; or comparing length and adding trailing spaces before each comparison). So you'll probably find that the fixed length has lower CPU usage in spite of the extra trimming operations. Not that it will affect overall performance, though.

I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.

How, exactly, is varchar easier for a programmer?

hakan.winther (10/16/2009)
personally, I would only use char in rare cases unless all the data for the column are equal.

That's reallly a shame, as the only good reasons to use varchar instead of char are a difference of more than 2 between average and maximum length; or a need to preserve trailing spaces.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Damian Widera-396333 Ten Centuries Points: 1186 More actions · Answer 7

There is one more interesting thing. Fixed length columns are placed in the same place in the row, whilst all columns like varchar, nvarchar etc are at the end of the row and their position may vary.

From the performance point of view it is better to have information stored in fixed length columns as they are on the same place in the row. Especially if you have 8,5 byte of total 10 used (~ lets agree with this assumption).

When you try to find data from nvarchar, nchar etc you have to read extra two bytes (mentioned in the explanation) to know where the data is (looking for the offset in teh row) in the row.

Kindest Regards,

Damian Widera
SQL Server MVP,
MCT, MCSE Data Platform, MCSD.NET

davoscollective SSCertifiable Points: 6355 More actions · Answer 8

Good question, made me think about datatypes.

nVarchar can actually use more than 2 bytes for some characters e.g. chinese characters that use 3 bytes.

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 9

david.howell (10/16/2009)
nVarchar can actually use more than 2 bytes for some characters e.g. chinese characters that use 3 bytes.

Do you have a source for that? Both Books Online and every website I've ever read say that nvarchar and nchar use 2 bytes per character, never less, never more.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

davoscollective SSCertifiable Points: 6355 More actions · Answer 10

Sorry, I am probably wrong in regards to MS SQL Server and nVarchar. I don't have an MS source.

I was thinking of unicode and the term 'multibyte' rather than double-byte.

Unicode (specifically the UTF-8 subset) can be encoded in 8-bit 'octets' and there can be between 1 and 4 of them (according to the 2003 revision).

See page 3 of the internet RFC 3629

http://www.ietf.org/rfc/rfc3629.txt

Off topic sorry.

anders-731262 Ten Centuries Points: 1256 More actions · Answer 11

anders-731262

Ten Centuries

Points: 1256

October 16, 2009 at 2:52 am

#1066753

Good one!

cengland0 SSCertifiable Points: 6102 More actions · Answer 12

This one got me again. Another trick question.

The question asked, "Which of these data types will be more efficient?"

Had it asked, "Which of these data types take less hard drive space?" I might have chosen the correct answer. I selected Varchar because it is more efficient when programming because char pads with extra spaces at the end and you need to rtrim the column before you produce your final output.

diamondgm SSCrazy Points: 2824 More actions · Answer 13

diamondgm (10/16/2009)
Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).

That's not an assumption, the even spread is explicitly mentioned in the question. Without it, I agree that the question would have been harder (if not impossible) to answer.

I think our definition of the word 'explicit' differs dramatically. I did not find the question to indicate that the spread was even as a fact.

Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.

I doubt it. Almost all business applications I have seen have their performance tied to an I/O bottleneck, and have the CPU twiddling it's virtual thumbs while waiting for more data to be read or written. Those few cycles the CPU spends to trim trailing spaces won't affect performance at all, it just means one less thumb to twiddle.

In the cases where this is true (which is most of the time) white space would aversely affect I/O, no? (Maybe I am very wrong here)

I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.

How, exactly, is varchar easier for a programmer?

Have you never had a programmer complain about having to trim on certain fields and not on others and wishing for a "standard" approach to text data?

I think you have raised excellent objections to my way of thinking about this problem. And on paper (assuming the average is 8.5), I can not disagree. Though in reality, I would still use varchar(10), if only for the convenience to the programmers at a realatively low performace cost.

Damian Widera-396333 Ten Centuries Points: 1186 More actions · Answer 14

cengland0 (10/16/2009)
This one got me again. Another trick question.
The question asked, "Which of these data types will be more efficient?"
Had it asked, "Which of these data types take less hard drive space?" I might have chosen the correct answer. I selected Varchar because it is more efficient when programming because char pads with extra spaces at the end and you need to rtrim the column before you produce your final output.

But in terms of I/O we could assume fixed length data types are better than varying ones.

Kindest Regards,

Damian Widera
SQL Server MVP,
MCT, MCSE Data Platform, MCSD.NET