Varchar usage

  • Hi All,

    My understanding of varchar data type is that it reserves only the size of actual data stored in it.

    For example if i have a field with size (50) but only 2 characters saved in it, it will reserve 2 only.

    So my question is , if above information is true why don't we just make all varchar fields size to be 8000 the max size available?

    Thanks and Regards

    Nader

  • Space used + 2 bytes.

    As for make everything varchar(8000), why? Is a person's first name going to be thousands of characters long? A city name? A company name? An address line?

    No, none of those are ever going to be that kind of size. The longest city name in the world is 176 characters (and that city is usually written with a much shorter name - Bangkok). Address lines sizes are mandated by the postal service, they won't be thousands of characters long.

    As with all design choices, use the appropriate data type (and length) for the data being stored as a form of constraint. Making everything 8000 is lazy design (and makes indexing much harder).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/29/2016)


    Space used + 2 bytes.

    As for make everything varchar(8000), why? Is a person's first name going to be thousands of characters long? A city name? A company name? An address line?

    No, none of those are ever going to be that kind of size. The longest city name in the world is 176 characters (and that city is usually written with a much shorter name - Bangkok). Address lines sizes are mandated by the postal service, they won't be thousands of characters long.

    As with all design choices, use the appropriate data type (and length) for the data being stored as a form of constraint. Making everything 8000 is lazy design (and makes indexing much harder).

    Thank you for your reply.

    I am asking to check what's the performance impact of defining sizes much bigger than what's actually needed.

    Also you mentioned it's harder for indexing , can you please explain why?, isn't it going to be the same concept.

    Thanks again.

    Nader

  • An index key is limited to 900 bytes. If I'm indexing, then I know that it's safe to put a varchar(10) into an index key, but if that's varchar(8000), then I don't know if it's safe to index, or whether the data might be over 900 bytes one day and cause an error.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/29/2016)


    An index key is limited to 900 bytes. If I'm indexing, then I know that it's safe to put a varchar(10) into an index key, but if that's varchar(8000), then I don't know if it's safe to index, or whether the data might be over 900 bytes one day and cause an error.

    Unfortunately i already have indexes on big varchar , does it make a difference if in index columns or included columns?

    Also by error do you mean the index will not be updated with actual data or what? and what will be the outcome in that case.

    Thanks again

    Nader

  • Key columns only. If an insert/update would make an index key over 900 bytes, the insert update will throw an error and fail.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I could have sworn I remember reading that declaring larger column sizes than required could cause some inefficiencies. i.e. incorrect estimates, over allocating memory...

    I wish I could find good documentation for this but I'm striking out. I'm sure Gail or somebody else could comment on this.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • yb751 (3/29/2016)


    I could have sworn I remember reading that declaring larger column sizes than required could cause some inefficiencies. i.e. incorrect estimates, over allocating memory...

    I wouldn't be surprised if that's the case, but I can't recall any explicit docs on it. Memory more likely than incorrect estimates of rows. The column size has to be taken into account when the query processor is requesting memory grants.

    Varchar size is the same as numeric scale and precision or other data type choice. It's based on the data expected for the column and choosing a suitable size is part of proper design work.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/29/2016)


    yb751 (3/29/2016)


    I could have sworn I remember reading that declaring larger column sizes than required could cause some inefficiencies. i.e. incorrect estimates, over allocating memory...

    I wouldn't be surprised if that's the case, but I can't recall any explicit docs on it. Memory more likely than incorrect estimates of rows. The column size has to be taken into account when the query processor is requesting memory grants.

    Varchar size is the same as numeric scale and precision or other data type choice. It's based on the data expected for the column and choosing a suitable size is part of proper design work.

    Sorry I didn't mean to imply incorrect estimate of rows but rather column size such as using the declared data type to determine the average size. Thanks for elaborating though...in the end I just think it's lazy programming at the very least.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Setting the upper limit on a VarChar(X) column is a form of logical constraint where X is the maximum length of data that a user would correctly supply. It's sort of like saying Sex Char(1) should only contain 'M', 'F', and 'U'.

    However, perhaps an even more compelling reason to place limits on the size of VarChar columns is that SSIS and other ETL tools use fixed width columns when buffering data. So if you are extracting rows from Customer, and LastName is defined as VarChar(8000), then every row in the buffer will have 8000 bytes reserved for LastName. You can work around this by re-casting the LastName column within the source SQL statement or using a Derived Column Transform task between the source and destination, but there is no point in coding around something like downstream that when you can just properly scale your datatypes in the table.

    Performance Best Practice: more rows per buffer

    http://microsoft-ssis.blogspot.com/2013/04/performance-best-practice-more-rows-per.html

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • nadersam (3/29/2016)


    why don't we just make all varchar fields size to be 8000 the max size available?

    Because if you do, the question does not become whether someone will end up putting an 8000-character novel in the postal code column, but when (and how often) it will happen.

    I am currently involved in several projects where data is converted from one system to another, and you won't believe the cr*p I encounter. City names in the postal code. A comma-delimited list of phone numbers in the Phone1 column (and nothing in Phone2). Driving instructions in the SecondAddressLine column. "Don't call before 10AM" in the phone column. An address in Tel Aviv but with country code IS (which is Iceland; Israel is IL). And those are just a few of today's highlights.

    All these were made possible because a developer thought that varchar(50) would be better for PhoneNumber than varchar(15), and that no input validation is needed. And now I have to wade through the mess and try to find a way to load those customers into a new system that does have proper data validation.

    Trust me. Do your successors a favor. Choose your data types with care.

    (Always imagine that your successor will be a maniacal axe killer ... who knows where you live)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (3/29/2016)


    I am currently involved in several projects where data is converted from one system to another, and you won't believe the cr*p I encounter. City names in the postal code. A comma-delimited list of phone numbers in the Phone1 column (and nothing in Phone2). Driving instructions in the SecondAddressLine column. "Don't call before 10AM" in the phone column. An address in Tel Aviv but with country code IS (which is Iceland; Israel is IL). And those are just a few of today's highlights.

    All these were made possible because a developer thought that varchar(50) would be better for PhoneNumber than varchar(15), and that no input validation is needed. And now I have to wade through the mess and try to find a way to load those customers into a new system that does have proper data validation.

    My favorite is still the logistics company who defined all columns as character.

    Shipped in 100AD and delivered on 40-08-5006? Don't mind if you do. Delivered before it was shipped? No problem.

    Along with shipped to NULL and shipped from NULL and shipped from locations that didn't have a depot, and 15+ variations of "Sandton City" and....

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you very much guys.

    For the sake of testing i have created a test table and populated it with data and checked bytes received from server under client statistics then i kept changing the data types and result is as attached in this post.

    It's exactly as you mentioned guys but i noticed that varchar(max) is taking additional size somehow.

    For nvarchar the numbers are really higher, nvarchar(50) bigger than varchar(max).

    Thanks

    Nader

  • Varchar(max) is a very different data type. It's got additional overheads because it's off-row data, It's definitely not something you'd use on all columns.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/30/2016)


    Varchar(max) is a very different data type. It's got additional overheads because it's off-row data, It's definitely not something you'd use on all columns.

    yes you are right off course but nvarchar is even worse even if you define with small size as in attached results.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply