Handle NULLs

Question

Post reply

Handle NULLs

Mike Scalise

SSCrazy Eights

Points: 8145
More actions
July 6, 2011 at 10:52 am

#239194

I have a situation where I'm querying two tables, each that has up to two hobbies listed. The first table has hobby1 and hobby2, and the second table has hobby3 and hobby4. It's possible for there to be values in any of the fields. So, it could look like:
hobby1, hobby2, hobby3, hobby4
basketball, tennis, NULL, NULL
or
hobby1, hobby2, hobby3, hobby4
NULL, NULL, basketball, tennis
However, it could also look like:
hobby1, hobby2, hobby3, hobby4
basketball, NULL, tennis, NULL
Ideally, I want it to look like the first example, where the values are brought to the left. I could use COALESCE in the first and second examples and have it work, but it wouldn't work on the third example. Any ideas how I could get something like the third example to look like the first one?
Mike
Mike Scalise, PMP
https://www.michaelscalise.com
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Revenant SSC-Forever Points: 42467 More actions · Answer 1

This is ugly and it would kill performance on large rowsets, but it works:

DECLARE @hobby1 NVARCHAR(50),

@hobby2 NVARCHAR(50),

@hobby3 NVARCHAR(50),

@hobby4 NVARCHAR(50);

DECLARE hobbiesCursor CURSOR FOR

SELECT hobby1, hobby2, hobby3, hobby4

FROM Hobbies

OPEN hobbiesCursor;

FETCH NEXT FROM hobbiesCursor

INTO @hobby1, @hobby2, @hobby3, @hobby4;

WHILE @@FETCH_STATUS = 0

BEGIN

DECLARE @i INT = 0;

WHILE @i < 3

BEGIN

IF @hobby1 IS NULL

BEGIN

SET @hobby1 = @hobby2;

SET @hobby2 = @hobby3;

SET @hobby3 = @hobby4;

SET @hobby4 = NULL;

END

SET @i += 1;

END

SET @i = 0;

WHILE @i < 2

BEGIN

IF @hobby2 IS NULL

BEGIN

SET @hobby2 = @hobby3;

SET @hobby3 = @hobby4;

SET @hobby4 = NULL;

END

SET @i += 1;

END

IF @hobby3 IS NULL

BEGIN

SET @hobby3 = @hobby4;

SET @hobby4 = NULL;

END

PRINT 'Hobby 1 ' + @hobby1;

PRINT 'Hobby 2 ' + @hobby2;

PRINT 'Hobby 3 ' + COALESCE( @hobby3, 'NULL' );

PRINT 'Hobby 4 ' + COALESCE( @hobby4, 'NULL' );

FETCH NEXT FROM hobbiesCursor

INTO @hobby1, @hobby2, @hobby3, @hobby4;

END;

CLOSE hobbiesCursor;

DEALLOCATE hobbiesCursor;

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

Why not normalize the data and do away with the entire issue? Having repeating data in multiple columns is a basic violation of first normal form, so you will end up having lots of issues like the one you mentioned.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Mike Scalise SSCrazy Eights Points: 8145 More actions · Answer 3

I suppose I could union the fields between the two tables and remove the NULLs that way, then convert the rows into columns if that's what you mean.

Mike Scalise, PMP
https://www.michaelscalise.com

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 4

I think he was asking why you don't refactor your tables. Your comment is where I was headed. Here is what I came up with:

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby1')

AND type IN (N'U') )

DROP TABLE hobby1 ;

GO

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby2')

AND type IN (N'U') )

DROP TABLE hobby2 ;

GO

CREATE TABLE hobby1

(

id INT,

hobby1 VARCHAR(20),

hobby2 VARCHAR(20)

) ;

CREATE TABLE hobby2

(

id INT,

hobby1 VARCHAR(20),

hobby2 VARCHAR(20)

) ;

INSERT INTO dbo.hobby1

(id, hobby1, hobby2)

VALUES (1, 'basketball', 'tennis'),

(2, 'basketball', NULL),

(3, NULL, NULL),

(4, NULL, 'basketball') ;

go

INSERT INTO dbo.hobby2

(id, hobby1, hobby2)

VALUES (1, 'basketball', 'tennis'),

(2, 'tennis', NULL),

(3, 'basketball', 'tennis'),

(4, NULL, 'tennis') ;

go

WITH cte(id, hobby, hobby_num)

AS (

SELECT h1.id,

h1.hobby1,

1

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

UNION ALL

SELECT h1.id,

h1.hobby2,

2

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

UNION ALL

SELECT h1.id,

h2.hobby1,

3

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

UNION ALL

SELECT h1.id,

h2.hobby2,

4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte2

GROUP BY id ;

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Revenant SSC-Forever Points: 42467 More actions · Answer 5

Yes, you could select individual columns, UNION them, GROUP them (which would remove nulls), and pivot them back into four columns. That would be uglier than the RBAR example I gave you.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

Sorry - I was in a bit of a hurry - so I wasn't as helpful as I should have been. That said - you did get the overall idea.

If you were to change your table structure to

PersonID (or some way to identify the individual)

Hobby

HobbyRank

So you'd end up with

PersonID Hobby Rank

1 Golf 1

1 Fishing 2

2 Piano 1

3 Video Games 3

3 Knitting 4

etc.....

for example, then it would be easy to completely ignore (or not store) any of the unpopulated/unknown/NULL entries. You could also easily into queries where you find those who have both hobbies in common (just not necessarily entered in the same order.

Interestingly enough - this is the middle state OPC.three is heading for in the CTE.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 7

Revenant (7/6/2011)
Yes, you could select individual columns, UNION them, GROUP them (which would remove nulls), and pivot them back into four columns. That would be uglier than the RBAR example I gave you.

:unsure: ...even with the nested loops introduced by the CTE I just posted I am not sure I would choose the cursor method you posted over them

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Revenant SSC-Forever Points: 42467 More actions · Answer 8

opc.three (7/6/2011)
...even with the nested loops introduced by the CTE I just posted I am not sure I would choose the cursor method you posted over them

Yes, RBAR is not my favorite technique but I had less than ten minutes before I had to go to a meeting, so it was written "under duress." 🙂

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 9

Revenant (7/6/2011)
opc.three (7/6/2011)
...even with the nested loops introduced by the CTE I just posted I am not sure I would choose the cursor method you posted over them
Yes, RBAR is not my favorite technique but I had less than ten minutes before I had to go to a meeting, so it was written "under duress." 🙂

I hear ya, not trying to nitpick, just saying I would opt for some crazy RBAR CTEs over a cursor. Meetings not withstanding I am always under duress when writing RBAR processes 🙂

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Nevyn SSChampion Points: 14210 More actions · Answer 10

Nevyn

SSChampion

Points: 14210

July 6, 2011 at 2:42 pm

#1349303

Can't you just unpivot and then repivot?

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 11

Nevyn (7/6/2011)
Can't you just unpivot and then repivot?

You sure could. The only thing that may become apparent is that hobby_num is a value-add column that the UNION ALL method provides that I do not know if the UNPIVOT method can provide.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

mister.magoo SSC-Forever Points: 47174 More actions · Answer 12

Hi,

I didn't like all the RBAR so I had a go at a non-RBAR and came up with this.

It is a bit hard to read because of the complexity of the nested functions but works well and has a nice simple plan:

-- create some test data

;WITH data(c1,c2,c3,c4)

AS

(

SELECT 'basketball',NULL,NULL,NULL UNION ALL

SELECT NULL,'basketball',NULL,NULL UNION ALL

SELECT NULL,NULL,'basketball',NULL UNION ALL

SELECT NULL,NULL,NULL,'basketball' UNION ALL

SELECT 'basketball','tennis',NULL,NULL UNION ALL

SELECT 'basketball',NULL,'tennis',NULL UNION ALL

SELECT 'basketball',NULL,NULL,'tennis' UNION ALL

SELECT NULL,'basketball','tennis',NULL UNION ALL

SELECT NULL,'basketball',NULL,'tennis' UNION ALL

SELECT NULL,NULL,'basketball','tennis' UNION ALL

SELECT 'basketball','tennis','cricket',NULL UNION ALL

SELECT 'basketball','tennis',NULL,'cricket' UNION ALL

SELECT 'basketball',NULL,'tennis','cricket' UNION ALL

SELECT NULL,'basketball','tennis','cricket' UNION ALL

SELECT 'basketball','tennis','cricket','curling'

)

SELECT d1.c1,d1.c2,d1.c3,d1.c4

INTO #temp

FROM data d1

-- now transform it by moving all nulls to the end without RBAR

SELECT c1toc4 AS newc1,

newc2,

newc3,

newc4,

c1,

c2,

c3,

c4

FROM #temp

-- first CROSS APPLY is just to provide shortcuts for the various COALESCE statements

CROSS APPLY (

SELECT c1toc4 = COALESCE(c1,c2,c3,c4) ,

c2toc4 = COALESCE(c2,c3,c4) ,

c3toc4 = COALESCE(c3,c4)

) cx

-- second CROSS APPLY figures out what goes in the second column

-- we know the 1st column is always c1toc4 - i.e. the first non-null

-- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4

-- if it matches either or those we need to either take column 4 or null (if col4 matches c1toc4 we have already used it for new column 1)

CROSS APPLY (

SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,c1toc4),c3toc4),c1toc4),NULLIF(c4,c1toc4))

) cx2

-- and this is for the third column

CROSS APPLY (

SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4,newc2),c1toc4),NULLIF(NULLIF(c4,newc2),c1toc4))

) cx3

-- and this is for the fourth column

CROSS APPLY (

SELECT newc4 = NULLIF(NULLIF(NULLIF(c4,newc3),newc2),c1toc4)

) cx4

I hope you like it and don't want even more columns!

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

tommyh SSCertifiable Points: 6302 More actions · Answer 13

mister.magoo (7/6/2011)
Hi,
I didn't like all the RBAR so I had a go at a non-RBAR and came up with this.
It is a bit hard to read because of the complexity of the nested functions but works well and has a nice simple plan:
-- create some test data
;WITH data(c1,c2,c3,c4)
AS
(
SELECT 'basketball',NULL,NULL,NULL UNION ALL
SELECT NULL,'basketball',NULL,NULL UNION ALL
SELECT NULL,NULL,'basketball',NULL UNION ALL
SELECT NULL,NULL,NULL,'basketball' UNION ALL
SELECT 'basketball','tennis',NULL,NULL UNION ALL
SELECT 'basketball',NULL,'tennis',NULL UNION ALL
SELECT 'basketball',NULL,NULL,'tennis' UNION ALL
SELECT NULL,'basketball','tennis',NULL UNION ALL
SELECT NULL,'basketball',NULL,'tennis' UNION ALL
SELECT NULL,NULL,'basketball','tennis' UNION ALL
SELECT 'basketball','tennis','cricket',NULL UNION ALL
SELECT 'basketball','tennis',NULL,'cricket' UNION ALL
SELECT 'basketball',NULL,'tennis','cricket' UNION ALL
SELECT NULL,'basketball','tennis','cricket' UNION ALL
SELECT 'basketball','tennis','cricket','curling'
)
SELECT d1.c1,d1.c2,d1.c3,d1.c4
INTO #temp
FROM data d1
-- now transform it by moving all nulls to the end without RBAR
SELECT c1toc4 AS newc1,
newc2,
newc3,
newc4,
c1,
c2,
c3,
c4
FROM #temp
-- first CROSS APPLY is just to provide shortcuts for the various COALESCE statements
CROSS APPLY (
SELECT c1toc4 = COALESCE(c1,c2,c3,c4) ,
c2toc4 = COALESCE(c2,c3,c4) ,
c3toc4 = COALESCE(c3,c4)
) cx
-- second CROSS APPLY figures out what goes in the second column
-- we know the 1st column is always c1toc4 - i.e. the first non-null
-- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4
-- if it matches either or those we need to either take column 4 or null (if col4 matches c1toc4 we have already used it for new column 1)
CROSS APPLY (
SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,c1toc4),c3toc4),c1toc4),NULLIF(c4,c1toc4))
) cx2
-- and this is for the third column
CROSS APPLY (
SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4,newc2),c1toc4),NULLIF(NULLIF(c4,newc2),c1toc4))
) cx3
-- and this is for the fourth column
CROSS APPLY (
SELECT newc4 = NULLIF(NULLIF(NULLIF(c4,newc3),newc2),c1toc4)
) cx4
I hope you like it and don't want even more columns!

Using your excellent testdata i came up with

select

SubString(s, 1, 20),

SubString(s, 21, 20),

SubString(s, 41, 20),

SubString(s, 61, 20)

from (select

IsNull(c1 + Space(20-Len(c1)), '') + IsNull(c2 + Space(20-Len(c2)), '') +

IsNull(c3 + Space(20-Len(c3)), '') + IsNull(c4 + Space(20-Len(c4)), '') s

from #temp) t

Which would atleast make it fairly simple to add more columns to it.

/T

mister.magoo SSC-Forever Points: 47174 More actions · Answer 14

tommyh (7/7/2011)
Using your excellent testdata i came up with
select
SubString(s, 1, 20),
SubString(s, 21, 20),
SubString(s, 41, 20),
SubString(s, 61, 20)
from (select
IsNull(c1 + Space(20-Len(c1)), '') + IsNull(c2 + Space(20-Len(c2)), '') +
IsNull(c3 + Space(20-Len(c3)), '') + IsNull(c4 + Space(20-Len(c4)), '') s
from #temp) t
Which would atleast make it fairly simple to add more columns to it.
/T

Gosh! I feel so stupid now - nice lateral thinking!

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);