char(n) v varchar(n) - data type length question

  • Hi experts,

    I posted this question in the DB design forum but this forum seems more active and appropriate.

    Could I please ask to clarify something:

    The following code

    declare @Variable varchar(10)

    set @variable = 'ABCDE'

    Select @variable as VariableValue

    , DATALENGTH(@variable) as DATALENGTH

    , LEN(@variable) as LEN

    Gives the output:

    ABCDE - 5 - 5

    My confusion is: I thought varchar(n) stores the value for n+2 bytes. Am I wrong for expecting an output of "ABCDE 7 5" ? Does the DATALENGTH() ignore the additional 2-byte overhead of varchar?

    Would appreciate some guidance

  • data length returns the length of the string, not the memory required to store the string itself.

    so the data_length really is five, but yes, sql uses at least seven bytes to store the string.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hey Lowell,

    Thanks for your reply. Im confused because the same exercise of an nvarchar or nchar string gives DataLength of 10 - i thought a similar overhead on varchar would show as well

  • yeah nvarchar uses twice the memory to store the data, that can be confusing.

    that's because nvarchar can store character sets that varchar can't.

    select N'??? / ??? huìyìzì','??? / ??? huìyìzì'

    /*

    (No column name)(No column name)

    ??? / ??? huìyìzì??? / ??? huìyìzì

    */

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks for clearing it up Lowell. That makes (some) sense

    Could you tell me what is conceptually the difference between DATA LENGTH (in bytes) and amount of bytes allocated to memory (storagesize??) ?

    the difference between LEN and DATALENGTH is rather straightforward, but difference between datalength and actual size on memory/disk is not so much

  • For varchar and nvarchar, it's sufficient to take the datalength of the string and add two bytes for the offset block. For most every day purposes, you don't really need to know all the extra bits and bytes that are used to render a row as you see it, and I have yet to come across a time I needed to know how many bytes my variables were taking up in memory.

    I'd try to get away from using len or datalength as a means of understanding the size a given row/variable takes up and just go off of what MSDN says. For instance, as you say, varchar is 1 byte per character plus a fixed two bytes for the offset block. If you need to know size, that's your calculation right there.

    len() is useful for when you need to know how many distinct characters exist in a given string. I use this for times when I need to do substring offsets and things like that.

    datalength(), while I'm sure has it's place, I have never had to use in a professional setting. It's probably much closer to returning the actual size the data takes up on disk or in memory, but then you have to add in caveats like offset blocks for variable length data types, which again, I usually just calculate off of what MSDN says.

    If you're interested in the nitity gritty of how SQL stores all the different data types, there are some good articles out there to show all the stuff that goes into storing the data. Personally I find it all very interesting, but again, for most every day purposes, this stuff is completely unnecessary to know.

    http://technet.microsoft.com/en-us/library/aa933068%28v=sql.80%29.aspx

    http://dba.stackexchange.com/questions/66471/script-to-estimate-row-sizes-for-any-table

    Executive Junior Cowboy Developer, Esq.[/url]

  • As Lowell stated, LEN() will only return the number of characters in a string (with a caveat that I'll explain in a minute) whereas DATALENGTH will return the number of bytes in a string. Generally speaking, Unicode strings will have 2 bytes per character in SQL Server.

    The caveat that I spoke of is that LEN() will NOT find/report on trailing spaces (and certain other characters) but DATALENGTH will whether of not it's a Unicode string or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply