SQLServerCentral Article

Male, Female and The Other One (NULL)

,

How do you explain that the

sales of Widgets for Widget Inc have been grossly understated for the past six

months since the ‘new whiz bang going to cost you lots of dollars’ reporting

systems was implemented.  Trying to

explain that the reason behind this reporting discrepancy was a misunderstanding

by the vendor of the three-value logic concept in ANSI SQL is not the easiest

of things to explain to a Chief Financial Officer.

Consider the following

example:

CREATE TABLE dbo.WidgetSales

(             SaleID INT IDENTITY(1, 1) NOT NULL,             SalesDate SMALLDATETIME NOT NULL,             CustomerID INT NOT NULL,             ProductID  INT NOT NULL,             Qty SMALLINT NOT NULL,             TotalAmt NUMERIC(7, 2) ) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt)       VALUES ('2005-11-23 20:20:00', 1, 1, 1, 2.50) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt)       VALUES ('2005-11-23 20:20:40', 1, 2, 2, 5.00) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt)       VALUES ('2005-11-23 22:21:00', 2, 1, 1, 2.50) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt) VALUES ('2005-11-23 20:22:00', 3, 2, 1, 2.50) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt)       VALUES ('2005-11-23 20:23:00', 4, 2, 2, 5.00) CREATE TABLE dbo.WidgetCustomers (             CustomerID INT IDENTITY(1, 1) NOT NULL,             JoinDate SMALLDATETIME NOT NULL,             FirstName NVARCHAR(20) NOT NULL,             LastName NVARCHAR(20) NOT NULL,             ShipAddress NVARCHAR(200) NOT NULL,             Gender BIT NULL-- 0 = Male, 1 = Female ) INSERT INTO dbo.WidgetCustomers (JoinDate, FirstName, LastName, ShipAddress, Gender)       VALUES ('2005-11-23 20:00:00', 'Peter', 'Ward', 'Where I Live', 0) INSERT INTO dbo.WidgetCustomers (JoinDate, FirstName, LastName, ShipAddress, Gender)       VALUES ('2005-11-23 21:00:00', 'David', 'Smith', 'Somewhere Over the Rainbow', 0) INSERT INTO dbo.WidgetCustomers (JoinDate, FirstName, LastName, ShipAddress, Gender)       VALUES ('2005-11-23 20:00:00', 'Mary', 'Ward', 'On the Bad Side of Town', 1) INSERT INTO dbo.WidgetCustomers (JoinDate, FirstName, LastName, ShipAddress, Gender) VALUES ('2005-11-23 20:00:00', 'John', 'Doe', 'The Big Building', NULL)

To calculate the sales of Widgets you would simply aggregate the TotalAmt column in the WidgetSales table as illustrated below:

SELECT SUM(TotalAmt) AS Sales
 FROM dbo.WidgetSales

Returns:

---------------------------------------
17.50
(1 row(s) affected)

To calculate the sales of Widgets bought by Males a query similar to the one below would be written:

SELECT SUM(s. TotalAmt) AS Sales
 FROM dbo.WidgetSales s JOIN WidgetCustomers c
   ON s.customerid = c.customerid
 WHERE c.gender = 0  -- Male

Returns:

Sales
---------------------------------------
10.00
(1 row(s) affected)

To calculate the sales of Widgets bought by Females a query similar to the one below would be written:

SELECT SUM(s. TotalAmt) AS Sales
 FROM dbo.WidgetSales s JOIN WidgetCustomers c
    ON s.customerid = c.customerid
 WHERE c.gender = 1  -- Female

Returns:

Sales
---------------------------------------
2.50
(1 row(s) affected)

So the total sales for Widgets is $12.50. 

That is the sales of Widgets purchased by Males ($10.00) plus the sales

of Widgets for Females ($2.50).  But we

know from the aggregation of the TotalAmt values in the WidgetsSales table that the total sales is actually $17.50. In this simplified example three-valued logic produced a third unknown gender value that was not catered for.

In most programming languages a logical expression can only return one of two possible values; TRUE or FALSE,  Or in the Widgets example, MALE or FEMALE.  However, ANSI SQL is unique in the fact that a logical expression can return TRUE, FALSE or UNKNOWN, or in the Widgets example, MALE, FEMALE and NULL. Although NULL is called a value it is simply a marker indicating the complete lack of a value.

Although this issue could be resolved by the Nullability of the gender column being set to FALSE so that NULL values cannot be entered, this is not always an appropriate solution.  For example, Widget Inc’s privacy policy makes the collection of gender optional so that a person has the ability to not specify their gender when subscribing as a customer of Widget Inc. This could be corrected by using an appropriate encoding scheme for gender.  For example the ISO gender codes are 0 = Unknown, 1 = Male , 2 = Female and 9 = Not Applicable.  Not Applicable is used for an entity which does not have a gender such as a company. The real issue though in this reporting discrepancy is not the schema design, but the way that NULL values are being handled in logical expressions.

The behavior of NULLS is

particularly evident in WHERE clauses where a NULL is involved in the expression. A WHERE clause will only return the expressions that are evaluated to be TRUE. NULL values are always treated as UNKNOWN and as they are not TRUE they are not returned. For example 10 < 20 is always evaluated as TRUE and 10 > 20 is always evaluated as FALSE.  However the expression 10 = NULL is UNKNOWN as NULL is an unknown value. It could be 20 or it could be any other value hence it can not be evaluated so a NULL value is evaluated as UNKNOWN. Therefore, 10 = NULL will always return UNKNOWN. So in the gender example UNKNOWN values do not equal MALE or FEMALE and are therefore not returned.

An additional source of confusion with three-valued logic is that NOT(UNKNOWN) is always UNKNOWN. For example, NOT(TRUE) is evaluated as FALSE and NOT(FALE) is evaluated as TRUE.  However NOT(UNKNOWN) is evaluated as UNKNOWN.  Consider the example below to return all the sales of Widgets that have not been purchased by a Male:

SELECT SUM(s.TotalAmt) AS Sales
 FROM dbo.WidgetSales s JOIN WidgetCustomers c
     ON s.customerid = c.customerid
 WHERE  NOT(c.gender) = 0 -- Male

Or

SELECT  SUM(s.TotalAmt) AS Sales
 FROM  dbo.WidgetSales s JOIN WidgetCustomers c
     ON s.customerid = c.customerid
 WHERE  c.gender <> 0  -- Male

Both Queries return the result below:

Returns:

Sales
---------------------------------------
2.50
(1 row(s) affected)

The observation made from this example is that UNKNOWN results will not be returned in a WHERE clause, as only those results that are satisfied to be TRUE will be returned. For example, return IF NOT('male') = 'female' is TRUE where as IF NOT(NULL) = 'female' is UNKNOWN. In order to check if a value equals (=) NULL then the IS NULL expression should be used, and to check that a value does not

equal (<>) NULL then the IS NOT NULL expression should be used.

So, in order to calculate

the total sales of Widgets by all genders the sales of Males, Females and

Unknown genders must be aggregated to ensure that sales from the Unknown gender

are considered.

ie.

SELECT SUM(s.TotalAmt) AS Sales
 FROM  dbo.WidgetSales s JOIN WidgetCustomers c
     ON  s.customerid = c.customerid
 WHERE c.gender IS NULL

Returns:

Sales
---------------------------------------
5.00
(1 row(s) affected) 

By understanding and

knowing  how NULL values are treated in

TSQL the Unknown will no longer be something that you need to worry about as

you will be able to anticipate and prepare for NULL values.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating