Varchar or Char?

  • hawaiianrebel (10/16/2009)


    See I got it wrong for two reasons:

    I was a knuckle head and missed the "English only" so yes varchar or char. The choice between the two though, to me, is subjective. Yes VARCHAR's do require an additional 2 bytes as length identifiers (remember VSAM? lol). I question is "Their length will vary from seven to ten characters" ... "with an even distribution of lengths" "Since the average size of the data is 8.5". We can't store data in half bytes (unless you are working with assembler. remember 3270 ASM?). In fact we can't store data in perfect little bytes. There are 32 or 64 bit WORD boundaries. For me 8, 12, or 16 would have been more efficient, but that's just the hardware nut in me.

    No, we can't store data in half bytes. But when we multiple units of data, and not all data uses the same number of bytes, we can get fractional bytes for the average unit. The simples example would be two rows, one with 10 bytes and the other with 11 bytes. That means 21 bytes for 2 rows, or an average of 10.5 bytes per row.

    The 32 or 64 bit word boundaries are completely irrelevant in SQL Server. If a row takes 1 byte less, 1 byte less is used. No 16-bit, 32-bit, or 64-bit aligning is done.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • john.arnott (10/16/2009)


    Yes, hard drives now cost in the range of a dollar a GB, but the line of business I support has to pay the enterprise infrastructure division quite a bit more than than on an ongoing basis as a pro-rated share of support services. I only wish I could say "go down to the nearest Best-Buy and pick up a couple of 2TB USB drives". Nope. In this (and I suppose most large corporations), managed storage is priced to the user based on the full estimated cost of ownership.

    The hard drive storage is quite expensive when you're looking at Enterprise systems. We use SANS drives with fiber channel drives with redundant drives. Also, they are backed up on tape and sent off-site. The cost of these systems are extremely expensive. When we ask for space, they generally give us 50 Gigs max. If we want more, it requires an act of congress. Every byte makes a difference.

    The log files that are generated by MSSQL is more annoying than the space used by the database itself. Those need to be backed up and then a shrink performed to recover that space. This needs to be done several times throughout the day since we have massive data loads and that increases the log files significantly.

  • How efficient will char be if you have throw an rtrim() in the where clause?

  • Fatal Exception Error (10/19/2009)


    How efficient will char be if you have throw an rtrim() in the where clause?

    Not as efficient, but why would you want to use RTRIM in a WHERE clause? Even if you are restricting by length, SQL Server won't be counting the built in padding.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Yeah thought so...I got it wrong because like a lot of you i chose varchar

    Its variable length data so and will need trimming before it can be used if you use char. The question "which is more efficient..." extends beyond the size of the type. without a bit more background on usage i find the question ambiguous

    If you said to me a billion rows worked with one at a time, no question, storage is an issue and single rows won't take much trimming so char would be best. But if you said 10,000 rows used widely in joins for many operations then storage is less of an issue and repeatedly trimming spaces is more of an issue.

  • We need to define a column that will contain externally provided identifiers. Their

    (1) length will vary from seven to ten characters

    (2) all English letters

    (3) both upper and lower case mixed with numeric digits

    (4) with an even distribution of lengths <--missed this point when answering, thus got it wrong

    Which of these data types will be more efficient?

    Correct answer nvarchar(25)...

    Because...

    (1) I don't care how many people you asked in management, this will change next week as soon as this is in PROD...you know that's true

    (2) for now, but when you least suspect it/expect it you will have worry about localization...

    (3) anh, moot point...see point 2

    (4) unless this is a lookup table, this can only be the case if the number of records is evenly divisible by four, as you add one record at a time...unless all your batch inserts are also divisible by four and only then do you insert.

    good question though, since the parameters were set as hypotheticals, but in the 'real' world those first set of constraints ALWAYS are wrong...LOL

  • Realy good Question

    keep it up John 🙂

    [font="Verdana"] There is no Wrong time to do a Right thing 🙂 [/font]

  • Hi,

    It is great question.

    Congrats.

Viewing 8 posts - 46 through 52 (of 52 total)

You must be logged in to reply to this topic. Login to reply