Four Rules for NULLs

  • But that's counted without the functional analists. For these guys, you don't enter a value if a field is not applicable to a certain record

  • If I wasn't sure what George Bush's middle name was.

    Wasn't it George "Wanker" Bush?

  • I don't think it is quite accurate to say that COALESCE() has weaker type checking than ISNULL().  The semantics are different.  The type of ISNULL() is the type of the first parameter.  The type of COALESCE() is the most common type of all of the parameters. 

    The type of COALESCE(INT, BIGINT, FLOAT, STRING) will be a STRING.

    The type of COALESCE(INT, BIGINT, FLOAT) will be a FLOAT.

    Another important point about COALESCE is that if all of the parameters are NULL, COALESCE will return a NULL.

    While the article referenced ANSI SQL-1992, the semantics of NULLs are the same in the more recent versions of the SQL standard, ISO and ANSI SQL:1999 and SQL:2003.  (And, if you thought the standards process was dead, we are currently working on another revision that should be complete in 2007.)

    Keith W. Hare

    Vice Chair, INCITS H2 -- the US SQL Standards Committee

    Convenor, ISO/IEC JTC1 SC32 WG3 -- the International SQL Standards Committee

     

     

     

     

     


    Cordially,

    Keith W. Hare

  • Thanks for the comments. The adequacy of NULLs has been a hotly debated topic since the beginning, even among the most seasoned veteran SQL Academics and DBAs, including the likes of Chris Date, Fabian Pascal, Hugh Darwen, and even Dr. Codd himself. Try Googling a few of those names + the word "NULL" for reference. You can also reference our previous discussion above on I-values, etc. for some of the issues involved.

    The "inherent issues" I am referring to in the quote you pulled are different, however. I am speaking to the additional complexity and performance issues involved when using IS NULL and IS NOT NULL. IS NULL and IS NOT NULL sargability can adversely impact your query performance. Adding the values of two columns or concatenating two columns requires the use of additional functions or statements, such as CASE or COALESCE (unless you really want to have a NULL value returned in a given query). Additionally, NULLable columns cannot be part of a PRIMARY KEY. And that's just on the back-end.

    On the UI and application layers, the fact that you have NULLable columns in the back-end means you have to *test* these layers by inserting NULLs in those columns. If not you are almost assured to get unpredicted (not unpredictable, but "unpredicted") results when NULLs are actually inserted into those columns.

    What seems to happen most of the time is that designers slap a table together and leave the columns NULLable by default, even though they NULLs shouldn't be allowed. The better option is to avoid NULLable columns when possible, and only use NULLs in columns where it is absolutely necessary.

    I think Joe Celko summed it up best when he said "NULLs confuse people..." This is true of SQL Professionals and hot dog vendors alike. Over-reliance on NULLs unnecessarily complicates matters.

    As for COALESCE vesus ISNULL: AFAIK, COALESCE is superior to ISNULL in two respects (as mentioned previously):

    1) COALESCE allows more than two arguments to be passed in.

    2) COALESCE is ANSI standard.

    Period.

  • Coalesce can be so usefull and easy to read.

    Inheritence is a good example

    eg select 'EmailAddress' = coalesce(UserEmail, OfficeManagerEmail, DivisionManagerEmail)

     

  • >> The adequacy of NULLs has been a hotly debated topic since the beginning, even among the most seasoned veteran SQL Academics and DBAs, including the likes of Chris Date, Fabian Pascal, Hugh Darwen, and even Dr. Codd himself. Try Googling a few of those names + the word "NULL" for reference. You can also reference our previous discussion above on I-values, etc. for some of the issues involved. <<

    For direct reference, and a proposed solution to missing data within the bounds of the relational model and 2VL, see Pascal's paper "The Final NULL In the Coffin"; details here:

    http://www.dbdebunk.com/page/page/1396241.htm

     

    Let's remember here as well that, within the limitations of SQL and its 3VL, you really should only use NULL for missing values.  (I prefer not to at all but recognise within an SQL DBMS (not a proper RDBMS), there may occasionally be a need; just be aware of what you're doing and why.)  As has been said, if you're using them for inapplicable values then there is a design flaw there (it likely belongs in a separate table).

  • Hi Jesper,

    Thanks for the comments. That's an excellent point about the two columns not being equal despite both being NULL, and SET ANSI_NULLS being OFF. Also, appreciate the feedback on the SET CONCAT_NULL_YIELDS_NULL statement. I would advise avoiding use of these SET statements that circumvent ANSI compatibility where possible, however.

  • There's a lengthy discussion above about this particular issue above - without rehashing too much, there are dozens of schemes to try to overcome this particular SQL limitation. Some of these include further normalization of your data (a separate engine_power table with entries only for those vehicles that have engines), some sort of flag indicating why a particular piece of data is missing, additional forms of NULL, etc.

    As you can see from the above discussion, my take on it is that there are far too many possible reasons a piece of data might be missing to just label it unknown or N/A if you really want to know the reasons the data is missing.

  • Thanks for the comments Keith. It's good to have feedback from the folks who set the standards. I cited the SQL-92 standard specifically because the majority of ANSI-compliance in SQL 2000 was based on SQL-92. With SQL 2K5 they seem to be catching up to the SQL-99 standard.

    BTW, any chance you could slip me a copy of the SQL-99 and SQL-2003 standards?

  • Excellent points and thanks for the direct link to Pascal's paper.

    As was mentioned before though, even if you properly normalize your data by placing data that might be deemed "inapplicable" in a separate table, all it takes is one OUTER JOIN and you're fighting those NULLs once again

    Thanks.

  • You might not like the guy, but you gotta admit that he's always good for a confusing quote I was going to use a Greenspan quote there, but I had too much trouble understanding anything he's ever said...

  • I use one of his quotes in my signature in another forum:

    "If I know the answer I'll tell you the answer; and if I don't, I'll just respond, cleverly."

  • Well no one ever accussed Rumsfeld of being unoriginal, kind of reminds me of another expression I really like: "bullsh*t baffles brains".

    Baffling aside, NULL can be a real problem especially with legacy systems or with sql statments between different database systems. However, having said that I really do like to use IS NULL in SS. Here is an example of using NULL and SET ANSI_NULLS from BOL (SQL v9):

    -- Create table t1 and insert values.

    CREATE TABLE t1 (a INT NULL)

    INSERT INTO t1 values (NULL)

    INSERT INTO t1 values (0)

    INSERT INTO t1 values (1)

    GO

    SET NOCOUNT ON

    -- Print message and perform SELECT statements.

    PRINT 'Testing default setting'

    DECLARE @varname int

    SELECT @varname = NULL

    SELECT *

    FROM t1

    WHERE a = @varname

    SELECT *

    FROM t1

    WHERE a <> @varname

    SELECT *

    FROM t1

    WHERE a IS NULL

    GO

    -- SET ANSI_NULLS to ON and test.

    PRINT 'Testing ANSI_NULLS ON'

    SET ANSI_NULLS ON

    GO

    DECLARE @varname int

    SELECT @varname = NULL

    SELECT *

    FROM t1

    WHERE a = @varname

    SELECT *

    FROM t1

    WHERE a <> @varname

    SELECT *

    FROM t1

    WHERE a IS NULL

    GO

    -- SET ANSI_NULLS to OFF and test.

    PRINT 'Testing SET ANSI_NULLS OFF'

    SET ANSI_NULLS OFF

    GO

    DECLARE @varname int

    SELECT @varname = NULL

    SELECT *

    FROM t1

    WHERE a = @varname

    SELECT *

    FROM t1

    WHERE a <> @varname

    SELECT *

    FROM t1

    WHERE a IS NULL

    GO

    SET NOCOUNT OFF

    -- Drop table t1.

    -- DROP TABLE t1

    I recommend using, and I did notice that there's no usage of, [WHERE NOT a IS NULL]. This is not the same as not using the not operator: <>. Getting myself into knots there... yeah ok, sql blokes are really bad at puns. To make sense of this, please omit the first two nots in the last three.

     

    Max

  • Yeah, ironically, Greenspan was being confusing for all the right reasons, or at least that's how it appears. Just wonder how intelligent he's going to look when the 'merican creditors come knocking on the door for the [what's the deficit now???]. But at least he's made a graceful exit into the speaker's circusit: http://www.nytimes.com/2006/02/19/business/yourmoney/19suits.html

    Max

  • Any quote from the NY Times must be treated with a skeptical eye, because what they print might be true.  In ANSI-SQL 'NY Times' = 'true' yeilds 'Unknown'.

    --Paul Hunter

Viewing 15 posts - 46 through 60 (of 152 total)

You must be logged in to reply to this topic. Login to reply