Varchar or Char?

  • Comments posted to this topic are about the item Varchar or Char?

  • As the column will hold only English language letters and digits, we don't need the additional bytes required by the national character data types

    Not quite correct. You don't need to use Unicode data types for non-English letters. For example, if the identifiers contain English letters, Russian letters, and digits, you can use Cyrillic collation:

    CREATE TABLE #tmp (id CHAR(10) COLLATE Cyrillic_General_CS_AS)

    INSERT #tmp VALUES ('??????1') -- Russian word for 'hello'

    INSERT #tmp VALUES ('Hello2')

    SELECT * FROM #tmp

    The result of this query is in the attached picture.

    If you need to operate with more than one national alphabet, using of nchar/nvarchar is the only option. Otherwise (only English alphabet, English alphabet + one national alphabet), any of char/varchar/nchar/nvarchar data types can be used.

  • Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).

    Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.

    I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.

    Nice question ๐Ÿ™‚

  • Nice question, but i agree with diamondgm. The average calculation need not be the same as you pointed out.

    "Keep Trying"

  • The moment I checked the answer I knew there had to be a catch... While I don't agree on the assumption about distribution of the average I still like the question. More of these please!

  • diamondgm (10/16/2009)


    Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).

    Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.

    I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.

    Nice question ๐Ÿ™‚

    I agree with the author of the question when it comes to storage size (and IO performance), but there are more things to think of than storage, like you the trailing spaces that has to be stripped.

    I got the question correct because I thought it had a catch, but personally, I would only use char in rare cases unless all the data for the column are equal.

    /Hรฅkan Winther
    MCITP:Database Developer 2008
    MCTS: SQL Server 2008, Implementation and Maintenance
    MCSE: Data Platform

  • This was a great question. Well done, John! I hope you have more of these in store for us.

    diamondgm (10/16/2009)


    Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).

    That's not an assumption, the even spread is explicitly mentioned in the question. Without it, I agree that the question would have been harder (if not impossible) to answer.

    Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.

    I doubt it. Almost all business applications I have seen have their performance tied to an I/O bottleneck, and have the CPU twiddling it's virtual thumbs while waiting for more data to be read or written. Those few cycles the CPU spends to trim trailing spaces won't affect performance at all, it just means one less thumb to twiddle.

    Oh, and if you really want to compare CPU times, then don't forget that SQL Server has to do extra work for each varying length column (like getting the length, then the data; or comparing length and adding trailing spaces before each comparison). So you'll probably find that the fixed length has lower CPU usage in spite of the extra trimming operations. Not that it will affect overall performance, though.

    I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.

    How, exactly, is varchar easier for a programmer?

    hakan.winther (10/16/2009)


    personally, I would only use char in rare cases unless all the data for the column are equal.

    That's reallly a shame, as the only good reasons to use varchar instead of char are a difference of more than 2 between average and maximum length; or a need to preserve trailing spaces.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • There is one more interesting thing. Fixed length columns are placed in the same place in the row, whilst all columns like varchar, nvarchar etc are at the end of the row and their position may vary.

    From the performance point of view it is better to have information stored in fixed length columns as they are on the same place in the row. Especially if you have 8,5 byte of total 10 used (~ lets agree with this assumption).

    When you try to find data from nvarchar, nchar etc you have to read extra two bytes (mentioned in the explanation) to know where the data is (looking for the offset in teh row) in the row.

    Kindest Regards,

    Damian Widera
    SQL Server MVP,
    MCT, MCSE Data Platform, MCSD.NET

  • Good question, made me think about datatypes.

    nVarchar can actually use more than 2 bytes for some characters e.g. chinese characters that use 3 bytes.

  • david.howell (10/16/2009)


    nVarchar can actually use more than 2 bytes for some characters e.g. chinese characters that use 3 bytes.

    Do you have a source for that? Both Books Online and every website I've ever read say that nvarchar and nchar use 2 bytes per character, never less, never more.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Sorry, I am probably wrong in regards to MS SQL Server and nVarchar. I don't have an MS source.

    I was thinking of unicode and the term 'multibyte' rather than double-byte.

    Unicode (specifically the UTF-8 subset) can be encoded in 8-bit 'octets' and there can be between 1 and 4 of them (according to the 2003 revision).

    See page 3 of the internet RFC 3629

    http://www.ietf.org/rfc/rfc3629.txt

    Off topic sorry.

  • Good one!

  • This one got me again. Another trick question.

    The question asked, "Which of these data types will be more efficient?"

    Had it asked, "Which of these data types take less hard drive space?" I might have chosen the correct answer. I selected Varchar because it is more efficient when programming because char pads with extra spaces at the end and you need to rtrim the column before you produce your final output.

  • diamondgm (10/16/2009)


    Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).

    That's not an assumption, the even spread is explicitly mentioned in the question. Without it, I agree that the question would have been harder (if not impossible) to answer.

    I think our definition of the word 'explicit' differs dramatically. I did not find the question to indicate that the spread was even as a fact.

    Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.

    I doubt it. Almost all business applications I have seen have their performance tied to an I/O bottleneck, and have the CPU twiddling it's virtual thumbs while waiting for more data to be read or written. Those few cycles the CPU spends to trim trailing spaces won't affect performance at all, it just means one less thumb to twiddle.

    In the cases where this is true (which is most of the time) white space would aversely affect I/O, no? (Maybe I am very wrong here)

    I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.

    How, exactly, is varchar easier for a programmer?

    Have you never had a programmer complain about having to trim on certain fields and not on others and wishing for a "standard" approach to text data?

    I think you have raised excellent objections to my way of thinking about this problem. And on paper (assuming the average is 8.5), I can not disagree. Though in reality, I would still use varchar(10), if only for the convenience to the programmers at a realatively low performace cost.

  • cengland0 (10/16/2009)


    This one got me again. Another trick question.

    The question asked, "Which of these data types will be more efficient?"

    Had it asked, "Which of these data types take less hard drive space?" I might have chosen the correct answer. I selected Varchar because it is more efficient when programming because char pads with extra spaces at the end and you need to rtrim the column before you produce your final output.

    But in terms of I/O we could assume fixed length data types are better than varying ones.

    Kindest Regards,

    Damian Widera
    SQL Server MVP,
    MCT, MCSE Data Platform, MCSD.NET

Viewing 15 posts - 1 through 15 (of 52 total)

You must be logged in to reply to this topic. Login to reply