RE: Who Likes NULL? – SQLServerCentral

SSC Guru

Points: 1003853

July 5, 2018 at 10:38 pm

#1996481

I like NULLs for a lot of things and you can do some great tricks with them in calculations and selection of alternate data. However, and as with all else in SQL Server, I say "It Depends".

Consider the subject of start and end dates. While I agree with you that tribal knowledge base magic numbers aren't usually the way to go, you do know one thing about end dates that haven't happened yet and that is they WILL happen someday in the future and so to avoid all the issues with WHERE @Today < EndDate OR EndDate IS NULL stuff, I just assign '9999" (becomes 9999-01-01 under the covers) to end dates that would otherwise be NULL.

Consider the subject of "Fragmentation". If you follow Kimberly Tripp's wonderful observations in her "The Clustered Index Debate" MCM video about using an ever increasing, narrow, unique, immutable column as the key for your Clustered Indexes and implement them, you might wonder why your Clustered Indexes see very large fragmentation levels and very low percent of page fullness ("Page Density", the other type of fragmentation that's just as and sometimes more deadly than the normal fragmentation people look at). The answer, of course, is INSERTs followed by "expansive" UPDATEs where the rows become larger thanks to the update. This is frequently caused by the "poor man's" audit technique of having a Modified_BY VARCHAR(50) column, for example, that starts its life as a NULL. If you KNOW that nearly every inserted row will be updated and the Modified_BY column will also be updated from NULL to something, then determine the most common largest value and prepad Modified_BY by setting a default for that column of the same number of spaces you just determined. The space will not go to waste because it will later be filled with data and so prevents massive bad page splits, the extra CPU, IO and Log File activity that goes with it all, and prevents you from having to use other than a 100% FILL FACTOR on your large clustered indexes in a vain attempt to prevent fragmentation which also saves a huge amount of disk space, memory, and time doing transaction log backups that don't need to be as large.

I do agree with Greg that a column like "LastSaleDate" should be saved for reports and not stored in data. It's just as bad as having both an EndDate column and a bit flag saying that the (for example) account is no longer active. I also agree with what he cited on the graphs... they should have be 4 nines instead of just 3. 😉 Just kidding on the 4 nines thing. It did make for amateur looking graphs or graphs written by a person who just wanted to get it off their plate. In truth, it probably happened just as Greg would suppose... someone didn't have the tribal knowledge and just did what they knew.

As yet another sidebar, a good QA team or individual would have caught that. If you haven't thanked someone in QA for saving your butt, they're not pushing you hard enough. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)