varchar(50) vs varchar(8000)

  • There seems to be no performance issues between a varchar(50) and varchar(8000) when the SAME amount of information is being inserted.  Basically, I'm doing some research for the company I work at, and I need to prove, without a shadow of a doubt, that going with a smaller variable will be more efficient.  Efficient, in the terms of speed, CPU usage, and disk space.  I'm using the available tools within SQL Server Enterprise Manager, and even the SQL Server Management Studio to analyze this.  So far, there is no performance issues whatsoever.  If someone has hard evidence that there is a difference, then that would be even better. 

  • I don't believe you'll see a performance issue at all when comparing varchar's of various lengths.

    My understanding, and I'm sure someone will correct me if I'm wrong, is that sql server only stores a pointer in the record location for varchars. The pointer just redirects the reader to a different location in the MDF file where the actual data is stored. Therefore when reading / writing to those types of columns the performance will be the same.

    Now, if you're really trying to eek that extra level of performance use the char or nchar datatype instead. These don't use pointers and instead store the data in line. So, they are a better way when sql performance is required.

    The only downside is that varchar's automatically trim trailing spaces whereas char's don't so this has to be accounted for in your code.

  • Yeah, that's where I'm heading towards.  Hey, thanks a lot! 

  • But what you're introducing by abandoning strong typing is sloppy coding standards. and sorry Chris but your concept of character storage in sql server is seriously flawed!

    You won't get better performance by using smaller varchars but you will get consistancy in your data dictionary.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • I feel like I must break in here and elaborate on what Colin probably meant and correct Chris missconception of data storage in SQL Server:

    If the VARCHAR is not VARCHAR(MAX) it will ALWAYS be stored (and in some cases also when you use VARCHAR(MAX)) IN the datarow. What happens is that 2 bytes per record/VARCHAR column are dedicated to store the ACTUAL length of the VARCHAR for that record/column.

    I.e you MIGHT get performance gains by using CHAR, but for a totally different reason; the CHAR length is ONLY specified in the metadata for the table (hence you do not have these 2 extra bytes per record and per column that store char/nchar/v.char/n.vchar data).

    This is hugely offset if your strings are of DIFFERENT length (well, if the length differ on avarage by more the 2 bytes) and the CHAR gets PADDED by space characters.

    So the only thing more or less correct is 'The only downside is that varchar's automatically trim trailing spaces whereas char's don't so this has to be accounted for in your code. '.

    But whenever you SET ANSI_PADDING ON your trailing spaces are NOT trimmed from VARCHAR.

    And in all cases a CHAR column is ALWAYS padded to it's defined length with blanks.

    Hanslindgren

    PS:

    Read up on 'sp_tableoption' on MSDN especially the parameter 'large value types out of row':

    large value types out of row

    1 = varchar(max), nvarchar(max), varbinary(max) and xml columns in the table are stored out of row, with a 16-byte pointer to the root.

    0 = varchar(max), nvarchar(max), varbinary(max) and xml values are stored directly in the data row, up to a limit of 8000 bytes and as long as the value can fit in the record. If the value does not fit in the record, a pointer is stored in-row and the rest is stored out of row in the LOB storage space.

  • If you are using the same data for test you will probably notice very little to no performance gain through the use of VARCHAR(50) instead of VARCHAR(8000). Due to the fact that the data will 'use' exactly the same amount of storage space. This is due to HOW VARCHAR works. The 'length' you specifiy (or size if you want to call it that) is nothing but a constraint for variable length column and not defining the actual size of the column.

    The performance difference you might see are in the cases the SQL Query Engine decides to plan for 'worst case scenarios'. When this happens it does not use statistics for a column but by brute force it generates a query plan in which the datasize of the column (for ALL records) are assumed to be the same as the defined MAX LENGTH. And hence it allocates memory and other resources to cope with this. And, as you might already have realized, this can be fatal for your performance if you only have strings of maximum datasize=50 but defined length=8000...

    This has not touched the theoretically most important reason; in an RDBMS your data model implies the correct data domains and straight out violating it by defining a data column to have a different domain then the data model is just wrong.

    But then again, if your data model dictates that your column should store strings up to 8000 characters in length, I see no reason why you shouldn't do that.

     

    Hanslindgren

     

  • Goldmine.    Thanks.

  • One thing not mentioned is you can't index the VARCHAR(8000).

    The maximum number of bytes in an index is 900.

    -CN

  • Guys,

    Thank you for correcting me.

    A company I used to work for hired a sql guy to teach us about Sql 2k when it first came out. One of the things they "taught" us was apparently quite wrong; but it's not the first thing they said that I've found wasn't quite true.

    Chris.

  • Correct!  I think that this will be in my findings.  Thanks for the suggestion!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply