Do You Still Use CHAR?

  • Comments posted to this topic are about the item Do You Still Use CHAR?

  • Sure.  I use CHAR.  It's all a part of right-sizing data.  Such right sizing does some incredible things, not the least of which is preventing the massive page splits and the resulting fragmentation in columns that are first inserted and then "ExpAnsively" updated, like that wonderful "Modified_By" column that a lot of people include in their tables.  Of course, there is a trade-off and you don't want to use a CHAR(50) (for example) on something that's only populated something like 10% of the time.  AND, of course, that's also where data-normalization or Fill Factors with proper index maintenance comes into play.

    Heh... some folks raise hell about the use of CHAR() but think nothing of wasting huge amounts of space with NULL VARCHAR()s.

    Uh... what's that you say?  NULL VARCHARs use no space???  Yeah... a whole lot of people think they know that.  Do an experiment.  Create a table with an Identity or other populated column as the first column in a Clustered Table and have 10 NULL VARCHAR(10) columns to the right of that.  Populate that first column for several rows.  Just that column.  Nothing else yet.  Let all those VARCHAR columns be NULL.  Then,  Use sys.dm_db_index_physical_stats to see the min, avg, and max row sizes .

    Once you have those 3 values written down, add just one byte to the VARCHAR column furthest to the right in the Clustered Table.  Everyone knows that column will now occupy just 3 bytes... 1 for the character and 2 for the length (which is actually incorrect but still has 2 bytes), right?  And the total row length will only grow by 3, RIGHT?

    Let's see if that's true... do another sys.dm_db_index_physical_stats and measure what you measured before.  Did the min, max, and average only increase by the 3 bytes caused by adding a single byte to the right most column?

    Surprise! 😀

    Now, ask yourself about the "poor man's" auditing that you build into a lot of your tables and where the "Created_By" column normally shows up in the table.

    Surprise, surprise, SURPRISE! 😀

    More to come on that subject.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Spot on, Jeff !

    The main point for people to prefer varchar over char is that their GUI development tool doesn't have to bother with trailing spaces.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • BigQuery has a data type called STRING with no size specification.

    In Snowflake all text columns are VARCHAR under the hood, including CHAR.

    For pure column stores the method of encoding probably renders the distinction between CHAR/VARCHAR less important

  • I agree with Jeff, Johann and the author of the article.  Use char where appropriate, for performance, yes, and also for data integrity.  If an invoice number is 10 characters, use char(10).  If (and it is an if) you take over a company with a 12-character invoice, then you have a one-off change to make.  In fact, I'd go even further and add a CHECK constraint, where appropriate, to ensure that inserted data conforms with the standard (for example making sure that phone numbers contain only numbers and any other permitted characters).

    John

  • We still use CHAR in places where the length is fixed, eg. Sedol, ISIN, 2 and 3 character ISO codes for countries, currencies, etc. We also used CHAR(5) for our client codes because they were always 5 characters. Yes, we merged with another company that used much longer client codes, but changing that datatype was the least of the problems.

  • I'll use CHAR(1) on occasion for internal status codes where I also have control over the domain of allowable values (ie: RunStatus = 'P').

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • David.Poole wrote:

    BigQuery has a data type called STRING with no size specification.

    In Snowflake all text columns are VARCHAR under the hood, including CHAR.

    For pure column stores the method of encoding probably renders the distinction between CHAR/VARCHAR less important

    This is also why I have a problem with a lot of 3rd party tools.  There's little consideration for what goes on in the actual database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve, your post today about CHAR reminded me of a recent dream, not sure it was really memory, of my early IT days, maybe about 19971, at First National Bank of Southwest Michigan ( we called it Fin-B-Swim ) when I was programming in Cobol, Autocoder, and Assembler on an IBM System-3 Model 10 (32k of memory, IBM Selectric TW console, 3 removable disks and two mag tape drives with 9-inch reels).

    Data entry was largely done with 80-column cards produced by the keypunch department with about 5-6 machine operators.  If memory serves, the cards had 12 rows of holes, 0 thru 9, A, B, and C.   The cards were prepared and brought to a mechanical reader-sorter machine about the size of five 4-drawer file cabinets.

    One day production was brought to a halt by a massive card-jam in the reader, requiring IBM service guys to be called from about 10 miles away.  They tore several panels off the machine, dug out the cards, only to find that when a correction had been made to a punch card, the original card had then been stapled to the replacement and put back into the deck.

    I can't say this actually happened, but it made for a fun dream.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • Thought crosses my mind that maybe the use of CHAR data is a good way to have to revisit the Y2K days.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • This conversation brings up an ancient pain of mine and that's the mantra of "Pre-optimization is the root of all evil" parable that Knuth first stated in one of his many papers.  In a previous job, the Developers where the ones that "designed" the databases (certainly not my decision).  ALL of the numbers where stored in NUMERIC(18,X) columns and ALL String data <= 256 bytes was stored in NVARCHAR(256) and everything larger than that was stored in NVARCHAR(MAX).

    When I asked why they would do such a thing, the lead developer looked at me and wobbled his head as he almost sang the parable.  I explained to him that there's a difference between pre-optimization and practicing really bad practices and all he was doing was the latter.  There wasn't even a consideration for storing something like a single letter status or Y/N or 1/0 or T/F in CHAR(1) or other datatype or storing a DATE or DATETIME in the correct datatype.

    The really good part was, I didn't go to jail that day because I was successful in suppressing the urge to end the gene pool that the lead developer was a member of. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • skeleton567 wrote:

    Thought crosses my mind that maybe the use of CHAR data is a good way to have to revisit the Y2K days.

    As of 2014, FaceBook recognizes 58 variations of gender. But we still have plenty of time to retire before it expands beyond the capacity of CHAR(1) or TINYINT.

    https://abcnews.go.com/blogs/headlines/2014/02/heres-a-list-of-58-gender-options-for-facebook-users

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Just got a good laugh at the end of this conversation.  Saw the following line.  Is this a limitation of a CHAR data element?

    Viewing 11 posts - 1 through 10 (of 10 total)

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • NOOOOOO!  The kingdom I inherited over 10 years ago had seen several different programmers and between duplication and poor design, they created a monster that I'm still dealing with.  CHAR has bitten me numerous times and I avoid it like a rattlesnake.  When I started here the DB was on a spinning drive like they all started out, but we've long since gone to VMs.  So, when we need space we just add more gigabytes!  It's already paid for!

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • Caruncles wrote:

    CHAR has bitten me numerous times...

    How so?  Especially when it comes to single byte stuff that even you know will never change?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 42 total)

You must be logged in to reply to this topic. Login to reply