Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Male, Female and The Other One (NULL)

By Peter Ward,

How do you explain that the sales of Widgets for Widget Inc have been grossly understated for the past six months since the ‘new whiz bang going to cost you lots of dollars’ reporting systems was implemented.  Trying to explain that the reason behind this reporting discrepancy was a misunderstanding by the vendor of the three-value logic concept in ANSI SQL is not the easiest of things to explain to a Chief Financial Officer.

Consider the following example:

CREATE TABLE dbo.WidgetSales

(             SaleID INT IDENTITY(1, 1) NOT NULL,             SalesDate SMALLDATETIME NOT NULL,             CustomerID INT NOT NULL,             ProductID  INT NOT NULL,             Qty SMALLINT NOT NULL,             TotalAmt NUMERIC(7, 2) ) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt)       VALUES ('2005-11-23 20:20:00', 1, 1, 1, 2.50) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt)       VALUES ('2005-11-23 20:20:40', 1, 2, 2, 5.00) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt)       VALUES ('2005-11-23 22:21:00', 2, 1, 1, 2.50) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt) VALUES ('2005-11-23 20:22:00', 3, 2, 1, 2.50) INSERT INTO dbo.WidgetSales (SalesDate, CustomerID, ProductID, Qty, TotalAmt)       VALUES ('2005-11-23 20:23:00', 4, 2, 2, 5.00) CREATE TABLE dbo.WidgetCustomers (             CustomerID INT IDENTITY(1, 1) NOT NULL,             JoinDate SMALLDATETIME NOT NULL,             FirstName NVARCHAR(20) NOT NULL,             LastName NVARCHAR(20) NOT NULL,             ShipAddress NVARCHAR(200) NOT NULL,             Gender BIT NULL-- 0 = Male, 1 = Female ) INSERT INTO dbo.WidgetCustomers (JoinDate, FirstName, LastName, ShipAddress, Gender)       VALUES ('2005-11-23 20:00:00', 'Peter', 'Ward', 'Where I Live', 0) INSERT INTO dbo.WidgetCustomers (JoinDate, FirstName, LastName, ShipAddress, Gender)       VALUES ('2005-11-23 21:00:00', 'David', 'Smith', 'Somewhere Over the Rainbow', 0) INSERT INTO dbo.WidgetCustomers (JoinDate, FirstName, LastName, ShipAddress, Gender)       VALUES ('2005-11-23 20:00:00', 'Mary', 'Ward', 'On the Bad Side of Town', 1) INSERT INTO dbo.WidgetCustomers (JoinDate, FirstName, LastName, ShipAddress, Gender) VALUES ('2005-11-23 20:00:00', 'John', 'Doe', 'The Big Building', NULL)

To calculate the sales of Widgets you would simply aggregate the TotalAmt column in the WidgetSales table as illustrated below:

SELECT SUM(TotalAmt) AS Sales
 FROM dbo.WidgetSales

Returns:

---------------------------------------
17.50

(1 row(s) affected)

To calculate the sales of Widgets bought by Males a query similar to the one below would be written:

SELECT SUM(s. TotalAmt) AS Sales
 FROM dbo.WidgetSales s JOIN WidgetCustomers c
   ON s.customerid = c.customerid
 WHERE c.gender = 0  -- Male

Returns:

Sales
---------------------------------------
10.00

(1 row(s) affected)

To calculate the sales of Widgets bought by Females a query similar to the one below would be written:

SELECT SUM(s. TotalAmt) AS Sales
 FROM dbo.WidgetSales s JOIN WidgetCustomers c
    ON s.customerid = c.customerid
 WHERE c.gender = 1  -- Female

Returns:

Sales
---------------------------------------
2.50

(1 row(s) affected)

So the total sales for Widgets is $12.50.  That is the sales of Widgets purchased by Males ($10.00) plus the sales of Widgets for Females ($2.50).  But we know from the aggregation of the TotalAmt values in the WidgetsSales table that the total sales is actually $17.50. In this simplified example three-valued logic produced a third unknown gender value that was not catered for.

In most programming languages a logical expression can only return one of two possible values; TRUE or FALSE,  Or in the Widgets example, MALE or FEMALE.  However, ANSI SQL is unique in the fact that a logical expression can return TRUE, FALSE or UNKNOWN, or in the Widgets example, MALE, FEMALE and NULL. Although NULL is called a value it is simply a marker indicating the complete lack of a value.

Although this issue could be resolved by the Nullability of the gender column being set to FALSE so that NULL values cannot be entered, this is not always an appropriate solution.  For example, Widget Inc’s privacy policy makes the collection of gender optional so that a person has the ability to not specify their gender when subscribing as a customer of Widget Inc. This could be corrected by using an appropriate encoding scheme for gender.  For example the ISO gender codes are 0 = Unknown, 1 = Male , 2 = Female and 9 = Not Applicable.  Not Applicable is used for an entity which does not have a gender such as a company. The real issue though in this reporting discrepancy is not the schema design, but the way that NULL values are being handled in logical expressions.

The behavior of NULLS is particularly evident in WHERE clauses where a NULL is involved in the expression. A WHERE clause will only return the expressions that are evaluated to be TRUE. NULL values are always treated as UNKNOWN and as they are not TRUE they are not returned. For example 10 < 20 is always evaluated as TRUE and 10 > 20 is always evaluated as FALSE.  However the expression 10 = NULL is UNKNOWN as NULL is an unknown value. It could be 20 or it could be any other value hence it can not be evaluated so a NULL value is evaluated as UNKNOWN. Therefore, 10 = NULL will always return UNKNOWN. So in the gender example UNKNOWN values do not equal MALE or FEMALE and are therefore not returned.

An additional source of confusion with three-valued logic is that NOT(UNKNOWN) is always UNKNOWN. For example, NOT(TRUE) is evaluated as FALSE and NOT(FALE) is evaluated as TRUE.  However NOT(UNKNOWN) is evaluated as UNKNOWN.  Consider the example below to return all the sales of Widgets that have not been purchased by a Male:

SELECT SUM(s.TotalAmt) AS Sales
 FROM dbo.WidgetSales s JOIN WidgetCustomers c
     ON s.customerid = c.customerid
 WHERE  NOT(c.gender) = 0 -- Male

Or

SELECT  SUM(s.TotalAmt) AS Sales
 FROM  dbo.WidgetSales s JOIN WidgetCustomers c
     ON s.customerid = c.customerid
 WHERE  c.gender <> 0  -- Male

Both Queries return the result below:

Returns:

Sales
---------------------------------------
2.50

(1 row(s) affected)

The observation made from this example is that UNKNOWN results will not be returned in a WHERE clause, as only those results that are satisfied to be TRUE will be returned. For example, return IF NOT('male') = 'female' is TRUE where as IF NOT(NULL) = 'female' is UNKNOWN. In order to check if a value equals (=) NULL then the IS NULL expression should be used, and to check that a value does not equal (<>) NULL then the IS NOT NULL expression should be used.

So, in order to calculate the total sales of Widgets by all genders the sales of Males, Females and Unknown genders must be aggregated to ensure that sales from the Unknown gender are considered.

ie.

SELECT SUM(s.TotalAmt) AS Sales
 FROM  dbo.WidgetSales s JOIN WidgetCustomers c
     ON  s.customerid = c.customerid
 WHERE c.gender IS NULL

Returns:

Sales
---------------------------------------
5.00

(1 row(s) affected) 

By understanding and knowing  how NULL values are treated in TSQL the Unknown will no longer be something that you need to worry about as you will be able to anticipate and prepare for NULL values.

Total article views: 9642 | Views in the last 30 days: 2
 
Related Articles
ARTICLE

Gender Differences in the Workplace

Today we have a guest editorial from MVP Jessica Moss that talks about the gender differences in the...

FORUM

Unknown authentication type

Unknown authentication type

FORUM

Gender Function

Hello everyone, I need to write a SQL function that will determine if a given field in a table is ...

FORUM

Data type for Gender

What is the recommended datatype for Gender? could think of a few rather un pc answers but don't thi...

BLOG

Modified Insert Unknown Row to Dimension Table

A little over a year ago Patrick LeBlanc wrote a blog post about inserting an unknown row into a dim...

Tags
sql puzzles    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones