Database Design Follies: NULL vs. NOT NULL

  • Comments posted to this topic are about the item Database Design Follies: NULL vs. NOT NULL

  • Nice article, but under performance you say in regards to VARCHAR:

    "NULL and empty string both take up the same space (zero bytes)"

    It was my understanding that a varchar, regardless of null or empty string value, will always take up the space of 2 bytes in the row for the offset to the end of the string.

    But this may have changed?

  • One little nasty to be aware of is the use of NOT IN with nulls. This simplifed snippet shows the problem. If f2 in t2 contains any nulls, the whole inner statement produces null and the outer select will never produce records.

    Select f1 from t1 where f1 NOT IN (Select f2 from t2)

    This can be seen with these examples:

    select 'yes' where 1 not in (1,2,3)-- no records as expected

    select 'yes' where 4 not in (1,2,3)-- Yes, as expected

    select 'yes' where 4 not in (1,2,null)-- no records !

    One solution is to coalesce to remove the null:

    Select f1 from t1 where f1 NOT IN (Select coalesce(f2,'') f2 from t2)

    Mark

  • The short answer is NULL, the longer answer is to never allow that situation to arise in the first place because it’s as confusing as dividing by zero. In fact, it’s worse as division by zero will typically throw an error, whereas multiplying by NULL will trickle through your SQL into the application and only cause trouble when NULL is no longer a viable result

    I see this as desired very behavior that we do not have enough of in T-SQL. And I like you and readers to be convinced of that as well.

    Hard exceptions like thrown by dive by zero can happen ** even ** when the 'result' of that operation will be filtered out by another condition later in the querying process. While in retrospect there is no reason to throw an exception, one is thrown anyway and usually unexpectedly.

    To deal with this, currently all SQL must be (re-)written to handle any such situation gracefully. In practice most code is not rewritten and is thus buggy. An error can popup into existence at any time due data, statistics or index change. While rewritten code will work, it will be a nasty piece to read and maintain as it is cluttered with logically unneeded exception handling such code duplication and case expressions.

    The concept of direct exceptions in a language such a SQL where operations can be reordered and thus executed speculatively is just plain wrong. Exceptions must be thrown when something is certain to be wrong and relevant, like what happens in procedural languages due to that code being in a fixed order. In SQL that means saving up errors somehow until that certainty exists. Null can be used for this in certain cases and is more of a help then a hinder with this particular problem.

  • I'm not sure that your example that Address_2 should allow NULLs is particularly convincing.

    The approach I was taught (and still follow) is that if NULL "means something" in relation to a column, then define the column as nullable. I think that's largely the argument presented in the article too.

    However, in the majority of applications, if Address_2 is an empty string (because it's not nullable), or if Address_2 has a null (because it was nullable), it's going to mean the same thing - There is no address line 2. So why allow two possible states in the database that mean the same thing (it just makes life harder for the developer writing the application that uses this database).

    I'd be interested to hear what others think...

  • Tim0 (10/30/2014)


    Nice article, but under performance you say in regards to VARCHAR:

    "NULL and empty string both take up the same space (zero bytes)"

    It was my understanding that a varchar, regardless of null or empty string value, will always take up the space of 2 bytes in the row for the offset to the end of the string.

    But this may have changed?

    Thanks for your attention to detail here! If we are going to go into further detail (which is typically our jobs :-), then you're definitely right---nothing in this world is free, not even a NULL value.

    For a fixed-length column, you are going to pay the price to store data there, even if it's NULL, all the time. The space is allocated up-front, regardless of what we store in there.

    For a variable-length column, such as VARCHAR, 2 bytes are used as a row-offset and are allocated no matter what we store in there. If we want, we could also include details on how the NULL bitmap works, though the space used there is trivial compared to everything else we have discussed.

    This does NOT take into account sparse columns, in which space is allocated up front to compress details about the storage of NULL values. In this case, the efficiency of storage for NULL values increases the more NULL values we have.

    The intent of this article was to discuss design considerations rather than get too deep into storage internals, which may distract from why we would do something in the first place. I left all that out for fear of this getting way, way too long πŸ™‚

  • radman_bobbin (10/30/2014)


    I'm not sure that your example that Address_2 should allow NULLs is particularly convincing.

    The approach I was taught (and still follow) is that if NULL "means something" in relation to a column, then define the column as nullable. I think that's largely the argument presented in the article too.

    However, in the majority of applications, if Address_2 is an empty string (because it's not nullable), or if Address_2 has a null (because it was nullable), it's going to mean the same thing - There is no address line 2. So why allow two possible states in the database that mean the same thing (it just makes life harder for the developer writing the application that uses this database).

    I'd be interested to hear what others think...

    This is definitely the right way of thinking, ie: NULL should mean something or not be used---so the question we are left to ask is: Will a NULL address mean something different than an empty string?

    If we want to differentiate between a user entering a blank address and a user entering nothing at all---whether or not any user input was received for a field, then NULL has definitive meaning. It means that the user skipped the field and didn't enter anything, as opposed to them explicitly entering a blank of some sort in there (or blanking out an existing value, or indicating via a check box that a value existed, but entered nothing, etc...)

    If everything ends up as an empty string in the end and there is no value to knowing the user's intent, then it has no meaning and we're OK with a NOT NULL, default empty string column.

    Both of these scenarios seem fair enough, and the correct choice would depend upon the designer's intention for the data and how it will be used. I think arguments could be made for either of these quite easily.

  • I agree, why allow two values to mean the same thing? I handle this by allowing optional fields such as AddressLine2 to be NULL and then implement a constraint to prevent an empty string (AddressLine2 <> ''). Thoughts on this approach?

  • Stacey Decker (10/30/2014)


    I agree, why allow two values to mean the same thing? I handle this by allowing optional fields such as AddressLine2 to be NULL and then implement a constraint to prevent an empty string (AddressLine2 <> ''). Thoughts on this approach?

    There are a number of approaches that can be taken here. As long as all relevant values have a distinctive meaning and purpose, then we're good to go.

    The meaning in your example is derived from either forcing the user to enter a non-blank string (or nothing at all), or including code in the application that ensures that all blanks become NULL values. The constraint ensures that there is no way for data to be created, and provides a hint to the optimizer. It removes meaning from the empty string completely, as it is no longer allowed.

    In many ways, your example uses NULL as a replacement for empty string logically. This is the opposite of the common tactic of a NOT NULL default empty string column, yet applying basically the same meaning to the "not entered" option.

  • Stacey Decker (10/30/2014)


    I agree, why allow two values to mean the same thing? I handle this by allowing optional fields such as AddressLine2 to be NULL and then implement a constraint to prevent an empty string (AddressLine2 <> ''). Thoughts on this approach?

    My view is that it is just as valid as using NOT NULL and storing empty strings (with or without default).

    What is more practical depends on what you know be done with the column in SQL or application code.

    Replacing nulls with empty strings for clean reporting to systems that do not support null would be avoided by storing empty strings. Storing NULL allows you to act on a it easier with the IsNull function whereas nulllIF( isNull() ) to replace an empty string with something else begins to look rather clumsy.

    As usual, it depends πŸ™‚

  • I have two other considerations for NULL vs NOT NULL.

    1) If you're application coding language requires checking for null values when you read in data from the database, this is a lot of extra coding and is a good reason to have NOT NULL columns.

    2) Use of default values for columns. To avoid needing NULL values, it is easy to set a default value for every field in a table, this allows you to not need to check for NULL values in your application for most queries (except for some aggregate or outer join queries). So a numeric field can be defaulted to 0, a date to a specific date in the past or future, a string to a blank value or

    'Unassigned', or 'Not Specified', etc.

    So using NOT NULL columns with default values keeps your database clean and the data within the tables predictable for your application. I know that default values can slow down insert operations, but for most applications this is not an issue.

  • radman_bobbin (10/30/2014)


    I'm not sure that your example that Address_2 should allow NULLs is particularly convincing.

    The approach I was taught (and still follow) is that if NULL "means something" in relation to a column, then define the column as nullable. I think that's largely the argument presented in the article too.

    However, in the majority of applications, if Address_2 is an empty string (because it's not nullable), or if Address_2 has a null (because it was nullable), it's going to mean the same thing - There is no address line 2. So why allow two possible states in the database that mean the same thing (it just makes life harder for the developer writing the application that uses this database).

    I'd be interested to hear what others think...

    In this case for the same reason we don't run all our addresses through an address sanitizing routine, or title case all the names in our database, or whatever other formatting option you want to worry about.

  • Robert Mark (10/30/2014)


    I have two other considerations for NULL vs NOT NULL.

    1) If you're application coding language requires checking for null values when you read in data from the database, this is a lot of extra coding and is a good reason to have NOT NULL columns.

    2) Use of default values for columns. To avoid needing NULL values, it is easy to set a default value for every field in a table, this allows you to not need to check for NULL values in your application for most queries (except for some aggregate or outer join queries). So a numeric field can be defaulted to 0, a date to a specific date in the past or future, a string to a blank value or

    'Unassigned', or 'Not Specified', etc.

    So using NOT NULL columns with default values keeps your database clean and the data within the tables predictable for your application. I know that default values can slow down insert operations, but for most applications this is not an issue.

    Well but default values don't always make sense for some columns, for example what about a balance field on an account, a 0 balance would mean something very different than a NULL and there might not be an appropriate value for a default.

  • I don't know how anyone using an application and seeing a NULL value on the screen would equate the NULL value to anything other than a zero. I would default a new account setup with a zero value for the balance. I don't see how a NULL value should ever be presented to a user for a balance. To me that is a bug that must be corrected. And if you are coding the application and you get a null value for the account balance, you are going to show a zero value on the screen for the user, not a NULL. So, by allowing a NULL value for a balance to begin with in the database, you are just adding more coding to the application.

    Again, if it is an aggregation or an outer join that is causing the NULL value you should either handle the Null value in the query or add code the application for those specific conditions. With the databases I design and use in applications, I never want to see a NULL value get read in by application code.

  • Robert Mark (10/30/2014)


    I don't know how anyone using an application and seeing a NULL value on the screen would equate the NULL value to anything other than a zero. I would default a new account setup with a zero value for the balance. I don't see how a NULL value should ever be presented to a user for a balance. To me that is a bug that must be corrected. And if you are coding the application and you get a null value for the account balance, you are going to show a zero value on the screen for the user, not a NULL. So, by allowing a NULL value for a balance to begin with in the database, you are just adding more coding to the application.

    Again, if it is an aggregation or an outer join that is causing the NULL value you should either handle the Null value in the query or add code the application for those specific conditions. With the databases I design and use in applications, I never want to see a NULL value get read in by application code.

    /shrug It really depends, think about an application that requires certain values to be entered before other actions can be taken on a record but the user might want to save that record and come back to it later. Do you have the application check for whatever default value you picked, which in the case of a balance might not have an invalid value, or NULL as the check whether a required field has been filled out?

Viewing 15 posts - 1 through 15 (of 137 total)

You must be logged in to reply to this topic. Login to reply