Do You Still Use CHAR?

  • Robin Wilson wrote:

    A popular college MIS system uses CHAR for many fields and this often leads to issues, not in the software, as that is built to account for this. However data is often extracted and may be dumped into Excel or another system and then formulas don't work due to the data being padded with spaces if the report writers forget to RTRIM the data.

    It is used on departmental codes and reference numbers but each provider uses different lengths so it makes no sense for it to be CHAR and it just seems to generate additional work, both in the software application and when exporting data to other systems so I can't really see the advantage unless a field will always be a certain length such as a Y or N field but then BIT would be better in that instance and for most else there are exceptions unless perhaps it is a national insurance number or something.

    If you're designing your databases so they can support such things as spreadsheets, you're going to have some issues with your databases that you might not even be aware of (bloated NULLs, page splits galore, massive fragmentation, excessive memory usage, etc, etc).  Go back and look at the post where I gave the example of 10 nullable VARCHAR(10) columns and build the example and just one of a few problems (bloated nulls) that can be attributed to the use of VARCHAR() for "convenience sake".

    If you want to make it so spreadsheets can use the data without trailing spaces, just make a view for the spreadsheet access.  Or, whatever process does the imports to Excel should handle that and other issues, IMHO.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Rick, please tell me that you're not advocating a "panacea length" for VARCHAR(N) where "N" is something like 256.  Please tell me you're not advocating the NUMERIC(18,N) should be a "panacea" for numeric values.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Rick, please tell me that you're not advocating a "panacea length" for VARCHAR(N) where "N" is something like 256.  Please tell me you're not advocating the NUMERIC(18,N) should be a "panacea" for numeric values.

    Well not quite that extreme.  But let me see, when I was in grade school, our phone number was '26R11' (no area code, no exchange, that just meant line 26 with one long ring and one short ring on the hand-crank wall phone, which everyone on line 26 heard and could pick up.  You told the switchboard  operator you wanted to connect to Mansfield, Illinois, number 26R11.  And our address was 'Rural Route 3, Mansfield, Illinois.  ( no street number , no zip code, not even the old 5-digit version.  And essentially all city street numbers were three digits, maybe four.  Now around us essentially all street numbers are at least 5 digits...but not ALL of them.  There are still some in the central parts of town that are still 3 digits.      Yours and my SS numbers, I believe are all officially NNN-NN-NNNN, but who is to say for how long?  And should you store it with or without the dashes?  And with the population growth (especially with the open borders), how long is it until somebody decides to make one of those 'standard' sized segments two digits longer to cover the growth. You can fit my bank account balance into a 5,3 format, but thank God not my IRA balance.

    What I'm advocating is responsible, common-sense analysis and design considering that the extra data storage space, processing considerations, etc are very, very cheap when compared to system redesign every few years.   Or...you be the one to go tell the executive board that you have yet another system redesign that has to be done by 1-1-2023, will cost $400k, and will delay three other projects

    In other words, it's far better to VALIDATE that a string of undetermined storage size in fact is 11 bytes long, does contain exactly (for now) 9 numeric digits in a 3-2-4 pattern delimited by exactly two (for now) hyphens, not commas or decimal points or slashes, but hyphens.  Note that I can quickly alter the specs by retyping the previous description.  But I can still throw it in that horrible, ugly VARCHAR data element.

    Or we can save several terabytes by making the amount field for checks max at 99,999.99.  But then Nancy Pelozi (actually her husband) sells $200,000,000.00 in stock and brings in the check for deposit.  Or all of a sudden we have to store bitcoin amounts that exceed the capacity of small int.

    Even a simple Yes/No value may someday need to include an option for MAYBE.

    Just because you CAN do something doesn't mean you SHOULD do it.  I'm advocating that we, as IT professionals, use our common sense to avoid putting our employers in very expensive boxes and ourselves up against very uncomfortable demands.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • Jeff Moden wrote:

    Rick, please tell me that you're not advocating a "panacea length" for VARCHAR(N) where "N" is something like 256.  Please tell me you're not advocating the NUMERIC(18,N) should be a "panacea" for numeric values.

    I'm going to speculate a bit here on this CHAR datatype and it's use and downfalls. CHAR data is probably the oldest and 'dirtiest' datatype that exists.

    I'll use dates as my example. When I started in IT, we mostly used dates in YYMMDD format. They were efficient from the standpoint that they only consumed 6 of the available 80 columns in punch cards for data entry. Sorting machines only took 6 passes to sequence the cards by date, and all was well as long as the keypunch operator made no errors and the sorting operator didn't accidentally shuffle the stacks taking them out of the pockets. It was fortunate that in those days, there was little if any need for complete date AND time data. But then I was assigned to create a package for a radio station to schedule commercials for broadcast. Suddenly time became critical. This took at least 4 more columns.

    And some of you probably didn't experience what was possibly the greatest and most expensive design fiasco in the history of IT, the infamous Y2K disaster. No one had considered the implications of the century change for date storge and use.

    Now, while CHAR data was likely the first data format in the history of IT, do some research on the plethora of data types and formats available today, and ask yourself why this is the case. I think SQL Server SQL alone has at least 6 formats dedicated date/time data type. Consider all of the other non-character datatypes and you find it mind-boggling.

    Now ask yourself why this is? We found more and more over the years/decades that the CHAR datatype was actually pretty worthless. It's good for visual presentation of data, but not much else. But then think what even a printed page would look like if each sentence had to consume exactly the same number of CHARACTERS.

    And then consider the old standby coding language of COBOL.   One of the big advantages of COBOL was that it allowed poor data design and would handle lots of the required conversions behind-the-scenes and hid most of the inefficiencies.

    Finally, if you bring a CHAR layout to a design review, you better have a really, really good argument. I would expect this possibly from a graduate of the current University of Phoenix program that awards an IT degree for a mere $12,000.

     

    • This reply was modified 2 years, 5 months ago by  skeleton567. Reason: typo

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • @rick,

    I remember those days but that's actually a really bad example for system of today.  Anyone that actually stores dates as a CHAR() in their normal database tables needs to be drummed out of the business.

    The subject of this article, IMHO, is asking if anyone uses CHAR() anymore.  And this isn't about punch cards or way back when you and I were programming on unit record equipment with banana plug wires to a wire programming board. 😀

    To ask the question, what have you used CHAR(N) for in SQL Server in the last couple of years?  Or what have you considered an oversized VARCHAR(N) for?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    @Rick,

    I remember those days but that's actually a really bad example for system of today.  Anyone that actually stores dates as a CHAR() in their normal database tables needs to be drummed out of the business.

    The subject of this article, IMHO, is asking if anyone uses CHAR() anymore.  And this isn't about punch cards or way back when you and I were programming on unit record equipment with banana plug wires to a wire programming board. 😀

    To ask the question, what have you used CHAR(N) for in SQL Server in the last couple of years?  Or what have you considered an oversized VARCHAR(N) for?

    Jeff, my intent WAS to give a really BAD example, a historical perspective of the potential effects of a bad design decision.  I agree and hope that at least very few actually store that datatype anymore.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • Jeffrey Williams wrote:

    below86 wrote:

    I didn't have any issues with CHAR columns lately, but we have had to adjust the size on our claim number(was char(8)) and policy number(was char(9)) sizes in the last few years to handle outside companies values.

    But this week I had an issue come up.  We have a claimant number field, this is a number assign within a claim for the different claimants on the claims.  Most of the time you may have a 1, or 2 on a claim, you could see up to 10, or more.  So the claimant number field was set up as a smallint.  plenty of room, right?

    That was all fine and dandy until we starting brining in outside claim info from other companies.  Now we had given them the specs of  what the field should be.  And for several years we haven't had any issues.  This week I found out they had been putting in a 'record id' key into this field. It finally maxed out what a smallint could hold.  So I had to make some quick changes to the database tables to make them int.  And had to go through and 'fix' the SSIS meta data in various packages to handle the larger value.

    None of these are actually issues with the char data type - in my opinion.  Would you setup an IDENTITY column and define it as bigint - because at some point in the future you might exceed the limitations of an integer column?  But - you don't know if that will happen, only that it could happen - maybe, in 10 or more years because of an acquisition?

    Why would you want to use varchar(10) in place of a char(10) - when the value in that column will always have 10 characters?  For example, NPI numbers are 10 characters and will never be less than 10 characters and it is extremely unlikely that this value will ever be increased to 11 or more characters.  So - to be 'safe' you set it to varchar(15) just in case...and then program around the invalid entries that have more than 10 characters, because some user will prefix the data entered - another user with postfix the data - and yet another will add dashes or asterisks or...

    Not to mention issues with updating varchar data - that can and will cause fragmentation.

    I don't think you got my point here.  The 'claimant number' field could have easily been set to a char 3 or 4 based on the company used claims systems.  I doubt you would ever run into a claim with more than 999 or 9999 people on one claim.  But the people that designed the tables in the warehouse thought about it, and used a smallint data type instead of a char.  And you would have thought the smallint would have been over kill based on what data should be in this field, but it was the smarter way to define the data.  And it would have been fine if we hadn't started bringing in data from outside vendors.   The tables were designed 10 to 15 years ago, before the company even thought about the idea of working with outside companies.   When this outside company started sending the 'record key' in that column instead of the value we asked for, it met the smallint criteria, until this month.  So any edits on the column would not have caught the increase until this month.

     

    My point was that even when you think you have designed it to handle what you think the absolute max could be, someone will find a way of breaking it at some point.

     

    Several years ago I worked for a company that had brought in some outside vendors to help create an enterprise wide data warehouse(EDW).  These 'experts' set up almost ever column that we stored data in as varchar(500).  Things like the above example of 'claimant number' would have been stored in a column of varchar(500), even though the data was 1, or 2 characters, and only numeric.  ab it of an overkill.  IMHO

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • I don't think we should eliminate CHAR, or that it shouldn't be used. I hope no one thought that was the point. More, I was curious how others view things.

    As some pointed out in discussions in other places (my blog, linkedin), there are definitely known sizes for data. Country codes, finance codes, etc. In those cases, use CHAR, especially if these are SARG fields.

    However, in many cases, I find businesses, and I agree, that we don't know, or haven't defined text data well. In those cases, I think varchar is a better choice than having to alter CHAR columns and potentially impact the pages underneath. Not definitely, but often that would be me approach. Esp in data warehouse systems.

  • Somebody please clarify this for me.  From a quick search it sounds as if CHAR datatypes are actually allocated space to the length of the element, whether filled or not.  On the other hand it sounds as if VARCHAR elements, if empty, consume only the length indication, however that works.

    So even if I define a VARCHAR(MAX) element it sounds as if an empty or partially filled element only consumes space to store the data length and whatever actual data it contains, while a CHAR would contain the actual specified maximum (even if empty) characters.

    Is this like the old fixed versus variable length records?

    If I'm understanding things right, I draw some conclusions:

    CHAR only makes sense if the bulk of the rows actually contains data, not empty elements.

    CHAR data elements are thus going to be much more vulnerable to overflow errors unless data is carefully validated before entering/updating.

    CHAR data elements will by their nature then artificially inflate the consumed space in a row, table, database any time they are not VERY CAREFULLY defined, potentially leading to more required modification of structures

    Of course, if VARCHAR data is actively modified,  it has the potential to increase fragmentation, which would be fixed by regular DB mainenance and optimization, while CHAR data would not in itself affect fragmentation as much, at the expense of massive storage consumption and massive data movement when the row is accessed, even if not actually filled.

    Pardon the pun here,  but fill me in!

    • This reply was modified 2 years, 5 months ago by  skeleton567.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • below86 wrote:

    Jeffrey Williams wrote:

    below86 wrote:

    I didn't have any issues with CHAR columns lately, but we have had to adjust the size on our claim number(was char(8)) and policy number(was char(9)) sizes in the last few years to handle outside companies values.

    But this week I had an issue come up.  We have a claimant number field, this is a number assign within a claim for the different claimants on the claims.  Most of the time you may have a 1, or 2 on a claim, you could see up to 10, or more.  So the claimant number field was set up as a smallint.  plenty of room, right?

    That was all fine and dandy until we starting brining in outside claim info from other companies.  Now we had given them the specs of  what the field should be.  And for several years we haven't had any issues.  This week I found out they had been putting in a 'record id' key into this field. It finally maxed out what a smallint could hold.  So I had to make some quick changes to the database tables to make them int.  And had to go through and 'fix' the SSIS meta data in various packages to handle the larger value.

    None of these are actually issues with the char data type - in my opinion.  Would you setup an IDENTITY column and define it as bigint - because at some point in the future you might exceed the limitations of an integer column?  But - you don't know if that will happen, only that it could happen - maybe, in 10 or more years because of an acquisition?

    Why would you want to use varchar(10) in place of a char(10) - when the value in that column will always have 10 characters?  For example, NPI numbers are 10 characters and will never be less than 10 characters and it is extremely unlikely that this value will ever be increased to 11 or more characters.  So - to be 'safe' you set it to varchar(15) just in case...and then program around the invalid entries that have more than 10 characters, because some user will prefix the data entered - another user with postfix the data - and yet another will add dashes or asterisks or...

    Not to mention issues with updating varchar data - that can and will cause fragmentation.

    I don't think you got my point here.  The 'claimant number' field could have easily been set to a char 3 or 4 based on the company used claims systems.  I doubt you would ever run into a claim with more than 999 or 9999 people on one claim.  But the people that designed the tables in the warehouse thought about it, and used a smallint data type instead of a char.  And you would have thought the smallint would have been over kill based on what data should be in this field, but it was the smarter way to define the data.  And it would have been fine if we hadn't started bringing in data from outside vendors.   The tables were designed 10 to 15 years ago, before the company even thought about the idea of working with outside companies.   When this outside company started sending the 'record key' in that column instead of the value we asked for, it met the smallint criteria, until this month.  So any edits on the column would not have caught the increase until this month.

    My point was that even when you think you have designed it to handle what you think the absolute max could be, someone will find a way of breaking it at some point.

    Several years ago I worked for a company that had brought in some outside vendors to help create an enterprise wide data warehouse(EDW).  These 'experts' set up almost ever column that we stored data in as varchar(500).  Things like the above example of 'claimant number' would have been stored in a column of varchar(500), even though the data was 1, or 2 characters, and only numeric.  ab it of an overkill.  IMHO

    Since the data is supposed to be numeric, I wouldn't have even considered using any string type of datatype to begin with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Johan Bijnens wrote:

    Spot on, Jeff !

    The main point for people to prefer varchar over char is that their GUI development tool doesn't have to bother with trailing spaces.

    Lordy.  I understand that concatenation of CHAR(N) can lead to issues but I wasn't aware of any other issues that a "GUI Development Tool" might have with trailing spaces.  Since I haven't even touched a GUI Dev Tool in about 2 decades, can you tell me what type of issues there are with trailing spaces there?  For example, are trailing spaces included in sting comparisons and you can't set a default to ignore trailing strings?  That would be a bugger.

    Yes, exactly that; "Hello" is not the same as "Hello   " in C# and probably most other languages used for GUI development. You have to remember to trim strings before comparing them.

  • Jeff Moden wrote:

    Johan Bijnens wrote:

    Spot on, Jeff !

    The main point for people to prefer varchar over char is that their GUI development tool doesn't have to bother with trailing spaces.

    Lordy.  I understand that concatenation of CHAR(N) can lead to issues but I wasn't aware of any other issues that a "GUI Development Tool" might have with trailing spaces.  Since I haven't even touched a GUI Dev Tool in about 2 decades, can you tell me what type of issues there are with trailing spaces there?  For example, are trailing spaces included in sting comparisons and you can't set a default to ignore trailing strings?  That would be a bugger.

    The problems is that the default cursor last position is the position after the last space. With a varchar field, that is the position behind the last character. Editing an existing value would first require the user to remove the trailing spaces before they can enter extra characters in the value of that field.

    Don't shoot me, I'm just the piano player 

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Thanks, Johan.  That sounds like one of the many reasons why I stopped working in the front-end world.  Having been one long ago, I appreciate what they actually have to go through and the ever-changing tool sets that they have to suffer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply