To Store Formatted Data or Not

  • I'm debating whether or not to store formatted or unformatted phone numbers and/or postal codes in the database.

    Advantages

    User controls formatting, including complex international formats.

    No need to dynamically set input mask on UI based on country code selection.

    No need to apply formatting for reports - it's already done.

    Disadvantages

    Potential for excess digits or invalid alpha / numeric where opposite may be required.

    Lazy customer service/administrative user my not key in value as customer specifies.

    Thoughts? Arguments?

  • Just my opinion, but things like phone numbers, ssn, zip code, etc... are not numbers (although they contain numerals). Since there would most likely never be any need to apply any type of math to those types of columns - why store them as numbers?

    Besides, it is easy enough to create a constraint to make sure the data inserted/updated is in the correct format (e.g. PhoneNumber Like '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]').

    Jeff

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • gwque (6/5/2008)


    I'm debating whether or not to store formatted or unformatted phone numbers and/or postal codes in the database.

    Advantages

    User controls formatting, including complex international formats.

    No need to dynamically set input mask on UI based on country code selection.

    No need to apply formatting for reports - it's already done.

    Disadvantages

    Potential for excess digits or invalid alpha / numeric where opposite may be required.

    Lazy customer service/administrative user my not key in value as customer specifies.

    Thoughts? Arguments?

    On my personal front, I would like to store phone numbers and zipcodes in formatted format. They can be easily used for printing and more readable and can be easily related to the respective data fields for which it is saved. Also I don't think there may be a huge difference in the space due to excess digits and probably one can live with it.

    Coming to human errors if the someone is lazy it can do the same mistake in any format.

    Prasad Bhogadi
    www.inforaise.com

  • Zipcodes in particular cannot be stored as numbers - they can start with zero.

    As far as storing the formatting, that can depend heavily on what you intend to do with the data. Phone numbers are acceptable as (202)555-5555 and 202-555-5555 and 202.555.5555. If the format the user entered matters - because they perhaps want it displayed in that way, I would store the format. However, if you intend to send mass faxes, I would typically strip out the formatting and just store the number portions because it is much easier to prevent bad entry.

    Either way, it is usually a good idea to control (at least to some extent) what gets entered either using a constraint in the database, or through the UI so you do not get entirely unexpected data.

  • gwque:

    would you store money values as '$1,234.02' and '£202.00'?

    Advantages

    User controls formatting, including complex international formats.

    No need to dynamically set input mask on UI based on country code selection.

    No need to apply formatting for reports - it's already done.

    i'm assuming that your answer to the money question will be "no", and the same reasoning applies to your original question also.

  • I prefer to store Zip codes, SSNs, phone numbers, et al, as text, and without the formatting (no dashes, etc.).

    Slightly narrower columns, for one thing. (Over millions of Zip codes, that hyphen between the left 5 and right 4 adds up Megs of storage space all by itself. Two parens, a space and a hyphen, over 1-million phone numbers, is 4 extra Meg of storage and RAM use. Not a big deal, but why add even that amount of load to the IO bottlenecks if you don't need to?)

    Opportunity for different formatting for different purposes. I find it easier to turn 1234567890 into (123) 456-7890, or 123-456-7890, than turning (123) 456-7890 into 123-456-7890, or 123-456-7890 into (123) 456-7890.

    Consistency. If I strip out special characters, then all phone numbers are stored the same way. Otherwise, I have to make sure that all users type the same input format, or have to clean up variable input formats to a standard storage format (in which case, why not standardize to no junk data).

    The point Antonio brought up about money is a good example of another reason.

    Edit: Just to be clear, this is pretty minor in the grand scheme of database building and integrity. It's a detail, and it does matter, but not as much as things like separating phone numbers into their own table.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I would avoid storing formatting. Things often change, people don't enter them the same way, and it's easier to strip formatting to ensure consistency than add it in later. Phone numbers are a great example. People enter

    - 3033333333

    - 303.333.333

    - (303) 333-3333

    - 303-333-3333

    - +1 303 333 3333

    etc.

    Zips as well.

    Better to deal with this in each client that has to present the information and can do so appropriately.

    The exception might be if you find that lots of resources are spent reformatting things or that you must ensure they conform to some format. Then I might make an exception.

    I've also usually stored phone numbers, zip, etc. as varchar for easier manipulation. I don't like things cased into numbers to save a little space, especially when things could get "math'd" together.

    zip + zip works one with char, another with numerics.

  • one last point.

    computed columns can make it easy for lazy app/gui/report developers to get formatted data.

    [font="Courier New"]table definition:

    ...,

    empPhoneNumber dbo.typePhoneNumberUS null,

    empPhoneNumberFormatted =

    left(empPhoneNumber,3)+'-'+ substring(empPhoneNumber,4,3)+'-'+ right(empPhoneNumber,4),

    ...[/font]

  • Another thing I thought of on this subject is that just storing the numbers makes certain types of pattern-matching much easier.

    What if, for example, you want to look up all the customers in a certain area code? If all the phone numbers are stored as 1234567890, then it's easy, "left(phone, 3)". Even if they are consistently stored in the same format, you have to do more, "right(left(phone, 4), 3)" for "(123) 456-7890", and so on.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply