I haven't had a chance to thoroughly test this myself yet, but Adam Machanic (MS MVP) has posted a blog on the topic. He says that, according to his tests, ISNULL() outperforms COALESCE() by about 10-12 percent. I ran his test locally a few times, and found ISNULL() to be about 8% faster when the first column is NULL, but found COALESCE() performed about 4% faster when both columns were non-NULL on my local SQL Server. Of course all this might not be applicable to your situation; the results can change as fast as you can say "dual processor".
Here's a link to his article: http://sqljunkies.com/WebLog/amachanic/archive/2004/11/30.aspx
Adam doesn't test the COALESCE(col1, col2, col3, ...) form, which would require a lot of nesting to simulate with ISNULL() [i.e., ISNULL(ISNULL(col1, col2), col3)]. I haven't investigated this myself, but the fact that COALESCE() has to check for more than 2 column names/values might be part of the reason ISNULL(col1, col2) can outperform it in some circumstances.
for very large tables I can confirm that for the one-column case ISNULL is a bit faster but I would still prefer coalesce instead of that little bit of speed.
If I might borrow an idea from CJ Date...I'd add a 0th rule that would take precidence over the other 4 listed in the article. Rule 0 states "Do not use NULLs."
Missing values are not allowed in true sets, any more than duplicates are. As a data architect I am well aware of situations where we just don't have a piece of data, and in some cases I will include nullable columns, but this is the exception rather than the rule. And the only reason I do it is because the available DBMS's (and the SQL standard) don't provide an acceptable way of dealing with missing information.
The example of employee middle names is one where I would probably allow nulls because middle name is an incidental piece of data that isn't likely to be used except for fairly insignificant informational purposes. However, if we change our focus to salary, and there are some employees without a salary, the most likely scenario is that salary information would belong in a seperate table from the other employee information and only those employees who have a salary would have a record there...
This illustrates a major problem with SQL NULLS and the way they are used. There are essentially two reasons for missing information. The data is unknown, or the data is not applicable. SQL NULLs lump both reasons together. In reality, if an attribute is not applicable to a given record then that record (and all others to which the attribute is not applicable) is actually of a different type than the records to which the attribute does apply and should logically be placed in a different table.
In addition to the problems pointed out in article, NULLs are a logical nightmare in aggregate functions.
COALESCE is fair better than isnull and I make my DBAs use COALESCE.
DECLARE @A TinyInt SELECT ISNULL(@A,256)
This will lead into Arithmetic overflow error for data type tinyint, value = 256.
But not with
DECLARE @A TinyInt SELECT COALESCE(@A,256)
but if you want your query to fail then obviously use isnull
Thanks for the suggestion! You can't go wrong with Date, Celko and the rest of the gurus.
I am with you on eliminating NULLs as much as possible; but as you also point out, sometimes they are the best/only tool SQL gives us to deal with missing information. I think the key is to use NULLs with caution and keep in mind that they do affect your logic and overall results.
As for your suggestion about storing salary information in a separate table, it makes sense; but when you do a LEFT JOIN on the salary table, you'll still have to deal with NULLs popping up in the result set. One method I've used to deal with "Not Applicable" and other specific statuses is to use a CHAR(1) column as a status code flag to indicate the specific reason the data is missing. It requires a little more programming logic on the front end, but it tends to be well worth it if the specific status information is important to your situation.
I'm submitting a second piece to go along with this article that describes how NULLs affect results in ANSI aggregate functions - that seemed to be a big question on another discussion board here at SQL Server Central.
Good points as presented. One might however also note that SQL as it exists is not an especially logically "clean" implementation of >2VL i.e.(arguably, it may be better to avoid nulls or go to a more comprehnsive implementaion of >2VL as for example firstsql attempts).
Specifically, SQL rather "mixes" A Marks and I Marks
I Marks - (Inapplicable - for the tuple in question - the column does not adhere to the table predicate). (Zero length string "user tokens" are suggested for I Marks in the article.)
A Marks - ("Applicable - value at present unknown")
Such mixing can mean lost data. A good example of this muddling from the article is the example of "John Jones":
In the case of John Jones, however, we have set the Middle_Name column to NULL. This indicates that we do not know whether John has a middle name or not. We may discover, in the future, that John has a middle name; likewise, we may discover he has no middle name. In either case, we will update his row accordingly. So, NULLs represent our "known unknowns"; the only thing we know for a fact is that we do *not* know whether John has a middle name or not."
Consider if it was known by the data entry technician that "John Jones" did have a middle name e.g.(but refused to divulge it) - basically such information is lost with a null - but not with an A Mark.
Also, some problems with the suggestion of implementing Zero-length strings, i.e.(instead of proper I Marks which are not available in SQL), include the issues of what to do about numeric columns; and the issue of distinguishing between when a zero length string "user token" - I Mark - is intended to be represented, and when a zero length string is intended to be represented. HTH
Thank you for the feedback!
ANSI SQL doesn't provide differentiation for 'states' of missing data; only a mark indicating that the data is indeed missing. In order to differentiate between different states of missing data, I would use another column (possibly a CHAR(1)) indicating the state/reason for the missing data.
The name table is a simplistic example, and the reason why a middle name is missing is probably not compelling enough to go through all the effort to track it (excepting possibly some very specialized businesses). A more likely scenario might be when tracking answers from a questionnaire from which you will be building statistical analyses. Some questions might not be applicable to your respondents; others might be disinclined to answer certain questions. For these types of situations, I find that an additional column indicating the reason for missing data works fairly well.
To my mind there are far too many possible reasons that data could be missing to try to implement a separate 'NULL' type 'mark' for each situation, in the same column that contains your actual data . At that point, I think you're headed back towards COBOL, where you place '999999999' in a row for 'N/A' and '88888888' for 'Refusal to answer', etc.
"To my mind there are far too many possible reasons that data could be missing to try to implement a separate 'NULL' type 'mark' for each situation, in the same column that contains your actual data . At that point, I think you're headed back towards COBOL, where you place '999999999' in a row for 'N/A' and '88888888' for 'Refusal to answer', etc."
Perhaps, though I Marks and A Marks were Codd's invention / suggestion, one point of which was presumably to suggest such different DBMS provided NULL marks or tokens specifically so that users would not have to resort to a proliferation of user contrived marks such as as in the Zero length string "user mark" suggested in the article, or with the '999999999' in a row for 'N/A' and '88888888'for 'Refusal to answer' of COBOL.
By this logic, supporting marks in the SQL standard may arguably be "better". Both the article and COBOL examples mentioned would seem to support implementing additional DBMS supported NULL marks rather than user implemented marks and single kind of NULL (to avoid heading back towards COBOL where many user encoded marks were the norm)?