does use of large datatypes where much smaller ones will do, affect performance?

  • when a table has one hundred columns, where 40 of them have datatypes far exceeding the max(len()) of data actually in them, will this alone impact query performance?

    Breakdown
    varchar(100) -   19 having max length 20 
    varchar(300) -   4  having  max length 51
    varchar(500), - 12  having   max length 94
    varchar(1000) - 2  having max length 86
    varchar(2000) - 2  having max length 115
    up to varchar(4000) - 2 having max length 3999 & 392

    --Quote me

  • Probably not, although it may affect data integrity.  If the maximum length of any value in the column is going to be 50, then you don't want varchar(300) for that column, since there's a danger you will accidentally admit invalid values.

    One way you may get a small performance improvement is if you have any fixed-length columns - credit card numbers for example.  In those cases you should use char instead of varchar, both for data integrity and to avoid the two-byte overhead that varchar incurs over char.  (Of course, if it's credit card numbers we're talking about, there are all sorts of security concerns on top of that, but that's beyond the scope of this topic.)

    John

  • Memory allocated for the execution of a query is based on the assumed row width. The assumed row width is not based on reality - it's based on the declared size of the columns.

    I posted details in another thread (https://www.sqlservercentral.com/Forums/1997686/NVarchar4000-and-Performance#bm1997863). That shows the memory requests and grants associated with selecting the same ~40-byte value out of columns of different data types. The larger the column size, the more wasted memory granted to the query.

    -Eddie

    Eddie Wuerch
    MCM: SQL

  • John says No, Eddie gives a fully explained Yes (on referenced link).

    Because of the well reasoned explanation I accept Eddie's answer. 

    Large datatypes require more space to index as well, don't they?

    --Quote me

  • polkadot - Wednesday, November 7, 2018 12:38 PM

    Large datatypes require more space to index as well, don't they?

    With character data, no.
    Physically, there's no difference between storing or indexing a 40-byte string as varchar(128) or varchar(512).
    An ASCII value stored as nvarchar(100) will take twice the space as varchar(100), unless row-level compression is in use.

    Eddie Wuerch
    MCM: SQL

  • oh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.

    if I take a len() of a column value and it returns 38, is this 38 bytes?

    --Quote me

  • It won't affect indexing, no, but Eddie is right - it does affect memory grants, which in turn affects performance.  I overlooked that in my first reply.

    LEN() returns the number of characters in a string.  DATALENGTH() returns the number of bytes it uses.

    John

  • polkadot - Wednesday, November 7, 2018 11:05 PM

    oh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.

    Declared varchar column length will not affect the physical (on-disk) or data cache storage. It will affect the performance of actually using that index. The values must be loaded into a query's executing memory space in order to be read and processed. The memory requested in order to load that data is based on the declared size, not the actual size. The data must read into memory in order to analyze it to figure out how large it is.

    varchar(100) - 19 having max length 20
    varchar(300) - 4 having max length 51
    varchar(500), - 12 having max length 94
    varchar(1000) - 2 having max length 86
    varchar(2000) - 2 having max length 115
    up to varchar(4000) - 2 having max length 3999 & 392

    With the exception of the varchar(4000) column storing up to 3999 bytes, any query accessing those columns will request much more memory than it actually needs in order to execute the query. This isn't a cache - it's single-use and each running query requests a separate grant. Memory wasted on queries can't be used for caches or keeping sorts out of tempdb.

    The cool thing is that shrinking the column definition (such as turning the varchar(2000) columns into varchar(200) columns) doesn't require any writes to the table, just a read to ensure the change is safe. Making similar moves across the other columns will have a positive effect on performance, which could be significant on a highly-concurrent system that touches this data frequently on many threads.

    Eddie Wuerch
    MCM: SQL

  • Eddie Wuerch - Friday, November 9, 2018 3:01 AM

    polkadot - Wednesday, November 7, 2018 11:05 PM

    oh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.

    Declared varchar column length will not affect the physical (on-disk) or data cache storage. It will affect the performance of actually using that index. The values must be loaded into a query's executing memory space in order to be read and processed. The memory requested in order to load that data is based on the declared size, not the actual size. The data must read into memory in order to analyze it to figure out how large it is.

    varchar(100) - 19 having max length 20
    varchar(300) - 4 having max length 51
    varchar(500), - 12 having max length 94
    varchar(1000) - 2 having max length 86
    varchar(2000) - 2 having max length 115
    up to varchar(4000) - 2 having max length 3999 & 392

    With the exception of the varchar(4000) column storing up to 3999 bytes, any query accessing those columns will request much more memory than it actually needs in order to execute the query. This isn't a cache - it's single-use and each running query requests a separate grant. Memory wasted on queries can't be used for caches or keeping sorts out of tempdb.

    The cool thing is that shrinking the column definition (such as turning the varchar(2000) columns into varchar(200) columns) doesn't require any writes to the table, just a read to ensure the change is safe. Making similar moves across the other columns will have a positive effect on performance, which could be significant on a highly-concurrent system that touches this data frequently on many threads.

    You should write an article on this, Eddie.  Too many people think that NUMERIC(18,0) and NVARCHAR(256) (the defaults of lot of "table designer" software) is OK to use for just about everything.

    And while Knuth was absolutely correct, too many people confuse doing things the right way with "pre-optimization".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply