Who Likes NULL?

  • Comments posted to this topic are about the item Who Likes NULL?

  • I like NULLs for a lot of things and you can do some great tricks with them in calculations and selection of alternate data.  However, and as with all else in SQL Server, I say "It Depends".

    Consider the subject of start and end dates.  While I agree with you that tribal knowledge base magic numbers aren't usually the way to go, you do know one thing about end dates that haven't happened yet and that is they WILL happen someday in the future and so to avoid all the issues with WHERE @Today < EndDate OR EndDate IS NULL stuff, I just assign '9999" (becomes 9999-01-01 under the covers) to end dates that would otherwise be NULL.

    Consider the subject of "Fragmentation".  If you follow Kimberly Tripp's wonderful observations in her "The Clustered Index Debate" MCM video about using an ever increasing, narrow, unique, immutable column as the key for your Clustered Indexes and implement them, you might wonder why your Clustered Indexes see very large fragmentation levels and very low percent of page fullness ("Page Density", the other type of fragmentation that's just as and sometimes more deadly than the normal fragmentation people look at).  The answer, of course, is INSERTs followed by "expansive" UPDATEs where the rows become larger thanks to the update. This is frequently caused by the "poor man's" audit technique of having a Modified_BY VARCHAR(50) column, for example, that starts its life as a NULL.  If you KNOW that nearly every inserted row will be updated and the Modified_BY column will also be updated from NULL to something, then determine the most common largest value and prepad Modified_BY by setting a default for that column of the same number of spaces you just determined.  The space will not go to waste because it will later be filled with data and so prevents massive bad page splits, the extra CPU, IO and Log File activity that goes with it all, and prevents you from having to use other than a 100% FILL FACTOR on your large clustered indexes in a vain attempt to prevent fragmentation which also saves a huge amount of disk space, memory, and time doing transaction log backups that don't need to be as large.

    I do agree with Greg that a column like "LastSaleDate" should be saved for reports and not stored in data.  It's just as bad as having both an EndDate column and a bit flag saying that the (for example) account is no longer active.  I also agree with what he cited on the graphs... they should have be 4 nines instead of just 3. 😉  Just kidding on the 4 nines thing.  It did make for amateur looking graphs or graphs written by a person who just wanted to get it off their plate.  In truth, it probably happened just as Greg would suppose... someone didn't have the tribal knowledge and just did what they knew.

    As yet another sidebar, a good QA team or individual would have caught that.  If you haven't thanked someone in QA for saving your butt, they're not pushing you hard enough. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • NULLs are a necessary evil, they are a reflection of the real world. I try to avoid them whenever I can but one cannot avoid them. They can be present whenever one uses outer joins, for example.

    Our problem is that the requirements engineers or the developers design the database and very often they don't really care. The database is somewhere where you dump data (as one of them told me). And so we end up with tables whose only non-null field is the primary key.

    We have a table that, amongst other things, assigns a specific value to a specific person and it can vary over time. It has also has many, many other columns.
    Those who designed the database could have made another table with the current value-person assignation, but no, it was easier to make a big table and create a new row when there was a new assignation. NULLs come in handy for those who want to build a database quickly and don't want to have to think about database design.

  • I find it very useful to use NULLs in temporary tables during ETL.
    Say for example I have a key, value table but the values might come from different sources but I know my full set of keys. I might want to load all the keys I know I should have values for into a temp table and then have multiple statements getting the values from various sources. I can then check for remaining Nulls before I do the final move from the temp table to a table that might not allow nulls and handle the missing values appropriately. it's easier to analyse all the missing values at the end and potentially report back to the user the missing data in one go.
    What really grinds my gears is when someone decides that they're not going to allow NULLs but decide that -1 (or some other weird value) will effectively be a NULL value in the table. Perhaps there is some weird reason for doing this now and again, but it annoys me!

  • When designing a table, it is essential to realise that there are real-world circumstances where NULL may be an important and meaningful representation of the data being modelled.

  • edwardwill - Friday, July 6, 2018 2:19 AM

    When designing a table, it is essential to realise that there are real-world circumstances where NULL may be an important and meaningful representation of the data being modelled.

    Exactly, they're not evil or anything to me - the important point for me is - is this the best model of the circumstances we can make? Yes, we may need to allow for them or do something funky when aggregating but if that is the correct model, it's the correct model. In general getting the model correct makes design issues sort themselves out most readily.

  • Ah NULLS...just as controversial as spaces vs. tabs!

    I especially use it for unknown datetime columns.
    Let's say the table has a column for CreateDate and ModifiedDate, then ModifiedDate will be NULL until such time the record is actually modified by somebody.

  • Stefan LG - Friday, July 6, 2018 3:34 AM

    Ah NULLS...just as controversial as spaces vs. tabs!

    I especially use it for unknown datetime columns.
    Let's say the table has a column for CreateDate and ModifiedDate, then ModifiedDate will be NULL until such time the record is actually modified by somebody.

    Yeah, but are they as controversial as pointless aliasing:

    SELECT * FROM Users U WHERE U.UserId = 12324

  • edwardwill - Friday, July 6, 2018 2:19 AM

    When designing a table, it is essential to realise that there are real-world circumstances where NULL may be an important and meaningful representation of the data being modelled.

    Thoroughly agree!

    What opinions do people have on, for example, postal address lines? Typically you might have 4 address lines plus city/town, county/state, postal code, country. Most addresses don't use all four lines. If not using NULL, what would you populate the unused address lines with?

  • edwardwill - Friday, July 6, 2018 3:38 AM

    Stefan LG - Friday, July 6, 2018 3:34 AM

    Ah NULLS...just as controversial as spaces vs. tabs!

    I especially use it for unknown datetime columns.
    Let's say the table has a column for CreateDate and ModifiedDate, then ModifiedDate will be NULL until such time the record is actually modified by somebody.

    Yeah, but are they as controversial as pointless aliasing:

    SELECT * FROM Users U WHERE U.UserId = 12324

    LOL! But "pointless" aliasing is very very useful when using intellisense in (eg) SSMS!

  • Romac - Friday, July 6, 2018 3:44 AM

    LOL! But "pointless" aliasing is very very useful when using intellisense in (eg) SSMS!

    True, if the code is for your personal consumption.  But if you're sharing it with others  me then please don't.  Drives me nuts.  And don't get me started on where to put the column delimiting commas! :angry:

  • edwardwill - Friday, July 6, 2018 3:47 AM

    Romac - Friday, July 6, 2018 3:44 AM

    LOL! But "pointless" aliasing is very very useful when using intellisense in (eg) SSMS!

    True, if the code is for your personal consumption.  But if you're sharing it with others  me then please don't.  Drives me nuts.  And don't get me started on where to put the column delimiting commas! :angry:

    Haha, yes! :hehe: Jonathan Swift wrote a story about people that went to war over this (well, about which way up to put an egg in an egg-cup) :crazy:

  • Romac - Friday, July 6, 2018 4:02 AM

    Haha, yes! :hehe: Jonathan Swift wrote a story about people that went to war over this (well, about which way up to put an egg in an egg-cup) :crazy:

    Completely off-topic, but I was reading about a trend on dating sites where people examine photos for hints as to the person they might be contacting.  One woman received a number of enquiries regarding the way she'd loaded her toilet paper (apparently it's supposed to be over ... or is it under?  I guess anyone in possession of a life wouldn't really give a stuff!)

  • I prefer to use a little logic instead of personal preference in my decision on what to do about the presence and use of nulls...

    There is a difference in how we should represent data in a database and how we represent it in a report.  For a database, I agree with the idea that a null represents a value that has not been considered yet or where one doesn't apply.  Default values that are meaningful and are of the same data type as the rest of the data in the field make sense, but things like "n/a" (not applicable), etc. only belong in our reports - they don't belong in a field containing date or number data types.  How many times have you come across a situation where someone used a varchar() data type for date or number fields just so they could put "n/a" or some-other such nonsense.

    We also need to be aware that there are some functions that blow-up if they get a null as a parameter, so we need to be careful when coding functions and other processes where we inject parameters from database fields.  A lot of "opinions" are actually old coding standards left over from the more stringent days of database design.  There use to be a time when you couldn't use spaces, nulls, and other such things in certain places in your database design.

    One of my pet peeves is when I see empty strings or spaces used to fill a field value with something that can't be seen, just to prevent a null.  You have to be careful with your filtering and WHERE clauses.  How do you code your filters if you have a database that has a mix of nulls, empty strings, and spaces?  I don't mind generously using the ISNULL() and TRIM() functions, or some other method to ensure that I have all bases covered.

    If  you have the privilege of being the only one who has control over what goes into your database, you can have your own opinions about what to do, but if you don't, then plan on having to deal with a messy situation.

  • From a programming perspective (i.e. the typical consumers of the database data), NULLs are a massive source of defects. NULL generally doesn't play well with code so it has to be trapped all over the place, leading to fragile code and insidious bugs.  There's a reason that Tony Hoare referred to NULL as a billion dollar mistake.

Viewing 15 posts - 1 through 15 (of 143 total)

You must be logged in to reply to this topic. Login to reply