DataType confusion

  • Greetz!

    I was speaking with a client a few days ago and they said something that contrasted with what I had learned about choosing datatypes.

    The comment was "I learned that if you are not doing math with a number then it should be stored as character data." I didn't say anything because I had never heard that and wanted to be certain that the way I had learned was the preferred way. My understanding is that SQL Server is optimized to use numerical values and to store numbers, even though they aren't used in mathematical equations, adds overhead due to storage and conversion requirements. The numbers in question appear to be key values.

    I'm hoping that someone here who works with SQL Server everyday would be able to resolve this for me. If you know of an authoritative source that talks about why it should be done one way or the other I would love to read that as well.

    Thank you!

    Even as a mother protects with her life
    Her child, her only child,
    So with a boundless heart
    Should one cherish all living beings;

  • ...

    "I learned that if you are not doing math with a number then it should be stored as character data."

    ...

    The man was in a wrong school :hehe:

    There are examples where numbers have nothing to do with math, but they still better to be stored as numbers.

    What about numeric sequences? You are no going to apply math to it, as they are mainly for order.

    Integers, in this case will take less space and perform much better in ORDER BY then characters, not saying that the order for numbers saved as character datatype will be very different to order based on the numeric value...

    I would say if the number you store is a number then use numeric datatype.

    If a number is in reality sequence of digits, which very often is subject of some formatting (eg. phone numbers), - then you better to use character datatype. It has nothing to do with math.

    There is a blog about choosing wrong ones: http://sqlblog.com/blogs/aaron_bertrand/archive/2009/10/12/bad-habits-to-kick-using-the-wrong-data-type.aspx

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • It depends on what they mean by "numbers with no math".

    Storage-wise, character data is more expensive. TinyInt can store numbers 0-255 in a single byte of data, but 1 byte of storage for strings, char(1), only allows storing 0-9. Similar for larger numeric types. That also means more RAM, more I/O, etc., when the data is live.

    If they're storing bar-code values, or anything similar to that where the position in the number determines the meaning of the data, then char datatypes will be better because you won't lose leading zeroes, can use substring and patindex, and things like that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the feedback. Good Stuff! I should have been more detailed about the numerics being stored. Firstly, the data in question originates in other systems and is brought into this table via a SSIS job. It is then being used for reporting. I have been asked to build a cube from it.

    The numerics appear to be key values. In fact some of the column names have 'Code' in the name. Their values? 578, 99 , 2010, 43417620, 000, 63, 9, 4. Nothing too large. There are also several columns that represent a count for a given fiscal year so values like 20, 4 and zero are common.

    Another obvious numeric would be an auto-incrementing identity field, which the majority of these appeared to be. I don't have much background on the data yet.

    I once heard that SQL is optimized and works most efficiently with numerical data. I'm not sure of the rationale for this unless the way it is stored in the indexes, or that there isn't as hefty a conversion required, or what. If that is the case then it would make sense to store the numbers in the right sized integer fields..especially if the original tables had them as integers to begin with.

    Is there any truth to SQL Server performs optimally with numeric data? What about numeric vs char indexes? does one provide a benefit over another?

    Even as a mother protects with her life
    Her child, her only child,
    So with a boundless heart
    Should one cherish all living beings;

  • Main point... Are leading zeros important?

    If this is something like a postal code where the code is 0001, not 1, or employee numbers, etc, then that must be stored as character data. If the leading zeroes aren't relevant (it's a quantity, a value, a count, a code where it's 2, not 02, etc), then store it as numeric data.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I use SQL server to generate barcode.

    Barcode Generate CRI SDK for Reporting Service is designed to generate, create various linear & 2D barcode into Microsoft SQL Server Reporting Service (SSRS) reports with a Custom Report Item (CRI).

  • Just to get in my 2 cents. If it's a number, store it as a number. If it just LOOKS like a number (zip codes, phone numbers, govt ID numbers, etc) then store it as a char value of some sort.

    I think what your client may have been trying to say, for example, is that you wouldn't want to (and can't) add one zip code to another. That would indeed make no sense. So a zip code isn't really a 'number' even though it uses 'numeric' values and looks like a number.

    If you have ever seen zip codes entered as integers you've probably experienced the dropped leading zeroes problem or seen a zip like '34567-1234' come out like '33333'. :crazy:

     

  • Another possible reason to store human readable/searchable codes, keys, identifiers, etc. as character data might be full text indexing. FTI will not index integer columns.

    Steve Pirazzi
    ONEWARE, Inc.
    http://www.ONEWARE.com

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply