Saving integers as varchar so that I can choose '' (empty) as content

  • I am considering saving integers as varchar. The reasons for that is:

    The case of employee numbers. Because only employee related records have an employee number, those not having any either are

    • null (which I don't like for different reasons
    • or 0 (the number zero)

    If I chose to save the employee number as a varchar, I could store '' (empty), which I would prefer.

    I am aware, that this solution has its downfalls. Would it be insane, or could one argue that it would make sense?

    I would like to eliminate null values as much as possible, but date -types I am not seeing a solution.

    Christoph Kauflin
    working in Controlling, but trying to program the tools we need to do that

  • DON'T do it is the correct answer. All columns should have the correct data type for their content and purpose.

     

    Null is what you should be using - fact you don't like them should not matter.

    As an alternative use a number (-1 for example) for entries that do not have an employee number

  • It is best to store integers as integers, dates as dates etc. There is nothing wrong with having nulls.

    I the case of Employee Numbers these are really codes, not integers. All HR systems I have seen store these as strings. The only practical point I would note is if you are using numbers keep all the numbers the same length with no leading zeros. eg If you will need six digits start with something like 100101 and not 1. This will save you a lot of pain when HR data is copied to other systems.

  • I thought queries and joins are easier if there are no null values in the joined fields?

    Christoph Kauflin
    working in Controlling, but trying to program the tools we need to do that

  • christoph.kauflin wrote:

    I thought queries and joins are easier if there are no null values in the joined fields?

    Please tell us where you heard this.  That is certainly a generality, and may or may not be true depending upon the situation.

    I'm assuming that you may be using the employee number in a relationship.  If it does not exist (null), the join would not occur.  NULL is undefined, and would be eliminated from the results.

    Can you provide a bit more details?

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • I must check, but as I remember from my own experience if there are strings, empty strings and null values, I have to query for empty or null to find values where there is a string in it. The database we are using has grown quite a bit in time and we are learning by doing, as we both are neither programmers nor database specialists. I will post, as soon I find the source for my opinion.

    Christoph Kauflin
    working in Controlling, but trying to program the tools we need to do that

  • Ken McKelvey wrote:

    It is best to store integers as integers, dates as dates etc. There is nothing wrong with having nulls.

    While I totally agree with the first statement there, I seriously disagree with the second as a blanket statement.  And, no... I'm NOT one that says "NULLs are Evil and should never be used".  You've just gotta be careful what and what you use them for.

    For example, a lot of people use NULLs for EndDates.  Then, when they want to find out if something is or was active at a particular point in time, then end up with overly complicate criteria such as the following.

      WHERE SomeDateColumn >= SomeStartDate
    AND (SomeDateColumn < SomeEndDate OR SomeDateColumn IS NULL)

    If they had used an end date of '9999' (which converts to 9999-01-01) as a default, then their criteria would be much shorter and the Optimizer would like you better...

      WHERE SomeDateColumn >= SomeStartDate
    AND SomeDateColumn < SomeEndDate

    I also disagree with the notion of using NULL in an integer column when the value of 0 works just fine in all but the rarest of cases.  For example, in a Payment column.  If a payment wasn't made, then you know that 0 dollars were paid.  NULL should be reserved for cases only when you absolutely don't know if a payment was made or not and that is rarely the case.  But, people mix the two and they end up doing searches for when payments weren't made using criteria like the following...

      WHERE Payment = 0 or Payment IS NULL

    ... or worse yet ...

    WHERE ISNULL(Payment,0) = 0

    You also get people that make other totally non-SARGable criteria in a totally unnecessary fashion because they don't actually know how NULL works.  For example, if they want to find all payments that aren't NULL or 0, they end up writing a non-SARGable criteria that looks like this...

      WHERE ISNULL(Payment,0) > 0

    ... when, in fact, the following criteria is all that is needed because it also filters out NULLs because you cannot equate to NULLs (unless you screwed up your server settings)...

      WHERE Payment > 0

    There are dozens of other places where NULL is used instead of using a proper default.

    I also disagree that an Employee Number should necessarily be treated as a code.  In fact, I also disagree that Zip Codes and Phone Numbers should be stored as strings.  The ONLY time you need leading zeros for Zip Codes is during display or printing and storing them as small integers would help guarantee that you're not getting anything but digits into the column.  That would also be one of the few places that I'd use a NULL for unknown although I'd probably use a 0 for unknowns because SQL Server really likes NOT NULL columns compared to NULLable columns.

    I'll also tell you that, if you're using "poor man's auditing" and have a Modified_By column, a NULL or Blank in that column is one of the leading causes of "ExpAnsive Updates" that makes mince-meat out of your nicely defragged indexes.  They should at least be defaulted to have the same value as the Created_By column.

    Again, I'm not a NULL-aphobe.  I just see way too many people that think there's nothing the matter with NULLs when there actually can be.

    As with all else in SQL Server, "It Depends".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry for my blanket statement about NULLs - EndDate and Payment are good examples of when not to use them.

    I am afraid I disagree with you about codes. Sooner or later someone will decide to add a check letter to an employee/account number or, more likely, start having addresses in Canada etc. Validation can be done via CHECK constraints and is easily changed.

  • >> The case of employee numbers. Because only employee related records have an employee number, those not having any either are NULL (which I don't like for different reasons or 0 (the number zero)

    If I chose to save the employee number as a varchar, I could store '' (empty), which I would prefer. <<

    Employee numbers are not numbers! They are identifiers. Do you understand that at the square root of your Social Security number is totally meaningless? The idea of confusing character strings and numeric value, such as zero, makes no sense. I think I know where you're getting this odea is, because I've been in this over 50 years. Back when I started programming, there is a language called COBOL. All data was represented in character strings. We basically took paper forms, and then put them into this newly found thing called computers.

    >> Just Would it be insane, or could one argue that it would make sense? <<

    You would be essentially someone trying to program with's. I don't about insanity, but it's really stupid.

    >> I would like to eliminate NULL values as much as possible, but date -types I am not seeing a solution. <<

    I agree with you about not using NULLs when you don't have to. And I'm one of the people that voted for them when it was on the ANSI standards committee :-). However, they do have a purpose In particular, when you create your DDL define what a NULL in a particular data element means. While you're designing your encoding schemes (I hope you actually do take the time to design encoding schemes and spent a lot of time and effort on them), you might want to have missing data element values. For example, in the ISO standards for sex code we have 0 = "unknown", 1 = "male", 2 = "female", and 9 = "lawful person" (corporation, organization, etc). , Obviously there's a difference between a code 0 and a code 9. It is significant.

    As far as dates go, the ANSI temporal standards do not have symbols for a positive eternity, a negative eternity , and other things that have been discussed in ANSI X3 H2 years ago. Temporal data is tricky. The usual convention. Over the decades has been to use the NULL as "not yet closed" for intervals, which imply a (start_timestamp, and end_timestamp) pair seen as a single data element.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Ralph Kimball has a method of not having any nulls in dimension tables. Instead, on the dimension table there are rows with negative keys: -1 for unknown, -2 for not applicable and -3 for not available.  Using nulls might negatively affect the performance of your queries so it's best to avoid them if possible. Then any (fact) table that has a foreign key to a dimension table will not have any nulls but instead point to one of the default values in the dimension table if no value is found or it is not applicable.

  • Ken McKelvey wrote:

    Sorry for my blanket statement about NULLs - EndDate and Payment are good examples of when not to use them.

    I am afraid I disagree with you about codes. Sooner or later someone will decide to add a check letter to an employee/account number or, more likely, start having addresses in Canada etc. Validation can be done via CHECK constraints and is easily changed.

    As far as I know, Raytheon (as an example because I worked for them for 15 years) has always used plain ol' numbers for employee numbers and continues to do so.  I'm thinking that the "later" you're talking about will be much later, if at all.  Of course, I do agree that someone could change their mind on that but, until they do, I'd rather keep them as integers especially if such columns are used as Clustered Index PKs.  As you'll remember, the Clustered Index is used in every non-clustered index in the table not to mention all the foreign key columns.  It's also not likely that such things as ZipCodes will ever be modified to have alphabetic content.  The same holds true for telephone numbers and a whole bunch of other things that people consider to be "codes" and use that to justify storing them as if they had alphabetic content.

    By the same token, if I were to go into a shop that used such things (heh... and most do), I wouldn't make a career out if to change them. 😀  I don't know why people think that numeric codes shouldn't be stored as numeric values, though.  The leading zero thing is the only thing I can think of and, to me, that should be handled at display time.

    Anyway, good conversation.  Thank you for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan AC Roberts wrote:

    Ralph Kimball has a method of not having any nulls in dimension tables. Instead, on the dimension table there are rows with negative keys: -1 for unknown, -2 for not applicable and -3 for not available.  Using nulls might negatively affect the performance of your queries so it's best to avoid them if possible. Then any (fact) table that has a foreign key to a dimension table will not have any nulls but instead point to one of the default values in the dimension table if no value is found or it is not applicable.

    I guess I'll not understand such a thing.  Unknown and Not Available seem to be the same thing (IMHO) to me in most cases and Not Applicable usually means you've not correctly normalized the table.  Of course, that's the nature of DWs, which I mostly dislike anyway.  Denormalization and the resulting duplication of data without the benefit of pre-aggregation (and almost no one I've seen does pre-aggregation) for DWs just doesn't make good sense to me.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcelko212 32090 wrote:

    >> Employee numbers are not numbers!

    Heh... I'll go ahead and disagree with you again, Joe.  There's no reason why "identifiers" cannot be numbers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • christoph.kauflin wrote:

    If I chose to save the employee number as a varchar, I could store '' (empty), which I would prefer.

    frederico_fonseca wrote:

    DON'T do it is the correct answer. All columns should have the correct data type for their content and purpose.

    Null is what you should be using - fact you don't like them should not matter.

    As an alternative use a number (-1 for example) for entries that do not have an employee number

    I'm curious as to why any row in an Employee table would ever have a NULL (or a -1 alternative or an "empty" alternative) for the Employee number for any row in the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm curious as to why any row in an Employee table would ever have a NULL (or a -1 alternative or an "empty" alternative) for the Employee number for any row in the table.

    I am talking about our main financial table which contains a lot of employee related cost, as salary, staff insurance, but also non workforce related stuff, such as depreciation (not sure if this is the right term, I am (Swiss) German native tongue, so I try to eyplain what I mean: the cost of something important losing value, such as ageing of Grand Pianos).

    Christoph Kauflin
    working in Controlling, but trying to program the tools we need to do that

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply