Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

The Semantics of NULL in SQL Server 2008 Expand / Collapse
Author
Message
Posted Tuesday, August 24, 2010 3:04 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:31 AM
Points: 5,484, Visits: 10,319
Paul

You're quite right - Books Online (2008 R2) says: "The SQL Server Native Client ODBC driver and SQL Server Native Client OLE DB Provider for SQL Server automatically set ANSI_NULLS to ON when connecting. This setting can be configured in ODBC data sources, in ODBC connection attributes, or in OLE DB connection properties that are set in the application before connecting to an instance of SQL Server. The default for SET ANSI_NULLS is OFF." When I connect through SSMS, ANSI_NULLS is set to ON - this must be the OLE DB Provider setting this for me.

Interestingly, the same topic in Books Online says that in a future version of SQL Server, it will not be possible to set ANSI_NULLS to OFF.

John
Post #973929
Posted Tuesday, August 24, 2010 3:15 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 2:32 PM
Points: 6,128, Visits: 8,393
I rated this article "poor". A lot of the information is correct, and relevant - but there is a very important error, right at the start of the article.

The author writes:
A NULL value in SQL Server is (by design) meant to indicate an unknown or undefined value

And this is absolutely not true. (Though it is indeed described this way in Books Online).

The implementation of NULL in SQL Server (not only SQL Server 2008!!) follows the ANSI standard. And the ANSI standard defines "the null value" as:
null value: A special value that is used to indicate the absence of any data value.

The key words here are "absence of any data value". Or, as I usually describe it: NULL represents missing data.

NULL does therefore NOT represent unknown data. Of course, the reason that data is missing might be that it is unknown (patient still in coma and unidentified; name, birthdate, etc are all unknown). But it might also be that it is inapplicable (when doing prenatal surgery, the patient has no name or birthday yet). Or it might be known, but totally irrelevant (there are good reasons to register the number of children a female patients has put into the world; for male patients, it's totally irrelevant in a medical database).



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #973934
Posted Tuesday, August 24, 2010 3:17 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 2:32 PM
Points: 6,128, Visits: 8,393
da-zero (8/24/2010)
I would've liked a paragraph on the behaviour of NULL in COUNT(*) and COUNT(DISTINCT *).

There's not much to say about that, as COUNT(*) does not care if values are NULL or not (it simply counts rows), and COUNT(DISTINCT *) is a syntax error.

In COUNT(columnname), NULL is relevant. If you demand a COUNT(Age) from the patients table, you are not requesting the number of patients in the database, but the number of ages in the database. Since NULL represents missing data, a row with a NULL age does not represent an age in the database. These rows are correctly excluded from the COUNT() result.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #973936
Posted Tuesday, August 24, 2010 3:21 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 2:32 PM
Points: 6,128, Visits: 8,393
Carlo Romagnano (8/24/2010)
One way to avoid worrying about NULLs is never to use them, always declaring columns as not allowing NULLs and designating default values for "empty" or "unknown". This will save you keystrokes, especially when you want to check whether a column does not have a certain value. However, you will anyway probably have to write logic for when columns have the designated default values.

TERRIBLE!
VERY TERRIBLE!

Agreed.
How NULL behaves might cause some unexpected results, but far less from how such "magic values" behave.

I'd say that, as long as you use NULLs as intended (i.e., to represent missing data without inferring anything about the reason it might be missing), how SQL Server treats the NULL values will in 99% of all cases match what humans would do when asked to operate on incomplete data. All without any special extra handling. Using "magic values" means you have to write special handling to mimic default NULL behaviour in those 99% of all cases.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #973938
Posted Tuesday, August 24, 2010 3:24 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 12:53 AM
Points: 2,620, Visits: 2,466
da-zero (8/24/2010)
Carlo Romagnano (8/24/2010)
One way to avoid worrying about NULLs is never to use them, always declaring columns as not allowing NULLs and designating default values for "empty" or "unknown". This will save you keystrokes, especially when you want to check whether a column does not have a certain value. However, you will anyway probably have to write logic for when columns have the designated default values.

TERRIBLE!
VERY TERRIBLE!


Constructive feedback, lalalalala.
Please elaborate why this is so terrible, that way we can all learn something.

First: problem for storage
Second: how can you determine if a value of a column has been specified? (e.g. a price of an article: 0 means it's free or the price has not been established?).
Third:
One way to avoid worrying about NULLs is never to use them, always declaring columns as not allowing NULLs
that's not true. Think OUTTER JOIN, the column could be null, although is declared NOT NULLable!
Post #973941
Posted Tuesday, August 24, 2010 3:25 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 2:32 PM
Points: 6,128, Visits: 8,393
paulm-771594 (8/24/2010)
I would be interested to know what the impact of using NULL is on the storage space required for a Column. i.e. does a nullable nvarchar column save space over a non-nullable nvarchar column with an empty string in it?

If you disregard page and row compression (SQL Server 2008), SPARSE columns (also SQL 2008) and vardecimal (introduced in one of SQL Server 2005's service packs and then superceded by row compression), the short answer is:

* For fixed length data types, there is no difference at all in storage requirements for NULL vs NOT NULL

* For variable length data types, there is no difference in storage requirements for NULL vs NOT NULL when storing actual data values; when storing NULL values, only as much space is taken is would be needed for a zero length content.

For instance, a varchar(20) with a string of length 5 takes 7 bytes (5 for the data; 2 for the length), whereas both the zero length string ('') and NULL take 2 bytes (0 for the data; 2 for the length).



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #973942
Posted Tuesday, August 24, 2010 3:27 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, March 26, 2013 11:41 AM
Points: 13, Visits: 20
Hugo Kornelis (8/24/2010)
paulm-771594 (8/24/2010)
I would be interested to know what the impact of using NULL is on the storage space required for a Column. i.e. does a nullable nvarchar column save space over a non-nullable nvarchar column with an empty string in it?

If you disregard page and row compression (SQL Server 2008), SPARSE columns (also SQL 2008) and vardecimal (introduced in one of SQL Server 2005's service packs and then superceded by row compression), the short answer is:

* For fixed length data types, there is no difference at all in storage requirements for NULL vs NOT NULL

* For variable length data types, there is no difference in storage requirements for NULL vs NOT NULL when storing actual data values; when storing NULL values, only as much space is taken is would be needed for a zero length content.

For instance, a varchar(20) with a string of length 5 takes 7 bytes (5 for the data; 2 for the length), whereas both the zero length string ('') and NULL take 2 bytes (0 for the data; 2 for the length).


Thank you Hugo - your explanation is very much appreciated!
Post #973945
Posted Tuesday, August 24, 2010 3:36 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 8:45 AM
Points: 13,635, Visits: 11,507
Hugo Kornelis (8/24/2010)
da-zero (8/24/2010)
I would've liked a paragraph on the behaviour of NULL in COUNT(*) and COUNT(DISTINCT *).

There's not much to say about that, as COUNT(*) does not care if values are NULL or not (it simply counts rows), and COUNT(DISTINCT *) is a syntax error.

In COUNT(columnname), NULL is relevant. If you demand a COUNT(Age) from the patients table, you are not requesting the number of patients in the database, but the number of ages in the database. Since NULL represents missing data, a row with a NULL age does not represent an age in the database. These rows are correctly excluded from the COUNT() result.


You're right, I didn't think about the syntax
What I meant was that the author should've included the COUNT function in the article, since it behaves differently when you specify additional arguments.

E.g. I would've liked examples of COUNT(*), COUNT(expression), COUNT(DISTINCT expression) and COUNT(ALL expression).

That would've made the article more comprehensive.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #973951
Posted Tuesday, August 24, 2010 3:41 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, December 15, 2014 3:15 PM
Points: 197, Visits: 460
Yes, I find that a lot of the confusion goes away by not calling NULL a "value" but seeing as a state (or state indicator).
Post #973953
Posted Tuesday, August 24, 2010 3:49 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 2:32 PM
Points: 6,128, Visits: 8,393
da-zero (8/24/2010)
What I meant was that the author should've included the COUNT function in the article, since it behaves differently when you specify additional arguments.

E.g. I would've liked examples of COUNT(*), COUNT(expression), COUNT(DISTINCT expression) and COUNT(ALL expression).

That would've made the article more comprehensive.

I agree. Which is exactly why I included some information on this in my answer.

To recap:
COUNT(*) - how many rows?
COUNT(column) - how many non-NULL values in the specified column?
COUNT(DISTINCT column) - how many different non-NULL values in the specified column?
COUNT(ALL column) - same as COUNT(column), since ALL is the default in the [ALL | DISTINCT] syntax.
COUNT(expression) and COUNT(DISTINCT expression) are basicallly the same as COUNT([DISTINCT] column), except they operate on an expression rather than just a column reference.

Example:
If SELECT SomeColumn FROM SomeTable returns the values 1, 2, 2, 3, 5, NULL, 8, then SELECT COUNT(*), COUNT(SomeColumn), COUNT(DISTINCT SomeColumn) will return 7, 6, 5. Because there are 7 rows, 6 of which have a value for SomeColumn, but there are only 5 different values.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #973962
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse