Handle NULLs

  • I have a situation where I'm querying two tables, each that has up to two hobbies listed. The first table has hobby1 and hobby2, and the second table has hobby3 and hobby4. It's possible for there to be values in any of the fields. So, it could look like:

    hobby1, hobby2, hobby3, hobby4

    basketball, tennis, NULL, NULL

    or

    hobby1, hobby2, hobby3, hobby4

    NULL, NULL, basketball, tennis

    However, it could also look like:

    hobby1, hobby2, hobby3, hobby4

    basketball, NULL, tennis, NULL

    Ideally, I want it to look like the first example, where the values are brought to the left. I could use COALESCE in the first and second examples and have it work, but it wouldn't work on the third example. Any ideas how I could get something like the third example to look like the first one?

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • This is ugly and it would kill performance on large rowsets, but it works:

    DECLARE @hobby1 NVARCHAR(50),

    @hobby2 NVARCHAR(50),

    @hobby3 NVARCHAR(50),

    @hobby4 NVARCHAR(50);

    DECLARE hobbiesCursor CURSOR FOR

    SELECT hobby1, hobby2, hobby3, hobby4

    FROM Hobbies

    OPEN hobbiesCursor;

    FETCH NEXT FROM hobbiesCursor

    INTO @hobby1, @hobby2, @hobby3, @hobby4;

    WHILE @@FETCH_STATUS = 0

    BEGIN

    DECLARE @i INT = 0;

    WHILE @i < 3

    BEGIN

    IF @hobby1 IS NULL

    BEGIN

    SET @hobby1 = @hobby2;

    SET @hobby2 = @hobby3;

    SET @hobby3 = @hobby4;

    SET @hobby4 = NULL;

    END

    SET @i += 1;

    END

    SET @i = 0;

    WHILE @i < 2

    BEGIN

    IF @hobby2 IS NULL

    BEGIN

    SET @hobby2 = @hobby3;

    SET @hobby3 = @hobby4;

    SET @hobby4 = NULL;

    END

    SET @i += 1;

    END

    IF @hobby3 IS NULL

    BEGIN

    SET @hobby3 = @hobby4;

    SET @hobby4 = NULL;

    END

    PRINT 'Hobby 1 ' + @hobby1;

    PRINT 'Hobby 2 ' + @hobby2;

    PRINT 'Hobby 3 ' + COALESCE( @hobby3, 'NULL' );

    PRINT 'Hobby 4 ' + COALESCE( @hobby4, 'NULL' );

    FETCH NEXT FROM hobbiesCursor

    INTO @hobby1, @hobby2, @hobby3, @hobby4;

    END;

    CLOSE hobbiesCursor;

    DEALLOCATE hobbiesCursor;

  • Why not normalize the data and do away with the entire issue? Having repeating data in multiple columns is a basic violation of first normal form, so you will end up having lots of issues like the one you mentioned.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I suppose I could union the fields between the two tables and remove the NULLs that way, then convert the rows into columns if that's what you mean.

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • I think he was asking why you don't refactor your tables. Your comment is where I was headed. Here is what I came up with:

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'hobby1')

    AND type IN (N'U') )

    DROP TABLE hobby1 ;

    GO

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'hobby2')

    AND type IN (N'U') )

    DROP TABLE hobby2 ;

    GO

    CREATE TABLE hobby1

    (

    id INT,

    hobby1 VARCHAR(20),

    hobby2 VARCHAR(20)

    ) ;

    CREATE TABLE hobby2

    (

    id INT,

    hobby1 VARCHAR(20),

    hobby2 VARCHAR(20)

    ) ;

    INSERT INTO dbo.hobby1

    (id, hobby1, hobby2)

    VALUES (1, 'basketball', 'tennis'),

    (2, 'basketball', NULL),

    (3, NULL, NULL),

    (4, NULL, 'basketball') ;

    go

    INSERT INTO dbo.hobby2

    (id, hobby1, hobby2)

    VALUES (1, 'basketball', 'tennis'),

    (2, 'tennis', NULL),

    (3, 'basketball', 'tennis'),

    (4, NULL, 'tennis') ;

    go

    WITH cte(id, hobby, hobby_num)

    AS (

    SELECT h1.id,

    h1.hobby1,

    1

    FROM dbo.hobby1 h1

    JOIN dbo.hobby2 h2 ON h1.id = h2.id

    UNION ALL

    SELECT h1.id,

    h1.hobby2,

    2

    FROM dbo.hobby1 h1

    JOIN dbo.hobby2 h2 ON h1.id = h2.id

    UNION ALL

    SELECT h1.id,

    h2.hobby1,

    3

    FROM dbo.hobby1 h1

    JOIN dbo.hobby2 h2 ON h1.id = h2.id

    UNION ALL

    SELECT h1.id,

    h2.hobby2,

    4

    FROM dbo.hobby1 h1

    JOIN dbo.hobby2 h2 ON h1.id = h2.id

    ),

    cte2(id, hobby, row_num)

    AS (

    SELECT id,

    hobby,

    ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

    FROM cte

    WHERE hobby IS NOT NULL

    )

    SELECT id,

    MAX(CASE WHEN row_num = 1 THEN hobby

    END) AS hobby1,

    MAX(CASE WHEN row_num = 2 THEN hobby

    END) AS hobby2,

    MAX(CASE WHEN row_num = 3 THEN hobby

    END) AS hobby3,

    MAX(CASE WHEN row_num = 4 THEN hobby

    END) AS hobby4

    FROM cte2

    GROUP BY id ;

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Yes, you could select individual columns, UNION them, GROUP them (which would remove nulls), and pivot them back into four columns. That would be uglier than the RBAR example I gave you.

  • Sorry - I was in a bit of a hurry - so I wasn't as helpful as I should have been. That said - you did get the overall idea.

    If you were to change your table structure to

    PersonID (or some way to identify the individual)

    Hobby

    HobbyRank

    So you'd end up with

    PersonID Hobby Rank

    1 Golf 1

    1 Fishing 2

    2 Piano 1

    3 Video Games 3

    3 Knitting 4

    etc.....

    for example, then it would be easy to completely ignore (or not store) any of the unpopulated/unknown/NULL entries. You could also easily into queries where you find those who have both hobbies in common (just not necessarily entered in the same order.

    Interestingly enough - this is the middle state OPC.three is heading for in the CTE.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Revenant (7/6/2011)


    Yes, you could select individual columns, UNION them, GROUP them (which would remove nulls), and pivot them back into four columns. That would be uglier than the RBAR example I gave you.

    :unsure: ...even with the nested loops introduced by the CTE I just posted I am not sure I would choose the cursor method you posted over them

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (7/6/2011)


    ...even with the nested loops introduced by the CTE I just posted I am not sure I would choose the cursor method you posted over them

    Yes, RBAR is not my favorite technique but I had less than ten minutes before I had to go to a meeting, so it was written "under duress." 🙂

  • Revenant (7/6/2011)


    opc.three (7/6/2011)


    ...even with the nested loops introduced by the CTE I just posted I am not sure I would choose the cursor method you posted over them

    Yes, RBAR is not my favorite technique but I had less than ten minutes before I had to go to a meeting, so it was written "under duress." 🙂

    I hear ya, not trying to nitpick, just saying I would opt for some crazy RBAR CTEs over a cursor. Meetings not withstanding I am always under duress when writing RBAR processes 🙂

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Can't you just unpivot and then repivot?

  • Nevyn (7/6/2011)


    Can't you just unpivot and then repivot?

    You sure could. The only thing that may become apparent is that hobby_num is a value-add column that the UNION ALL method provides that I do not know if the UNPIVOT method can provide.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi,

    I didn't like all the RBAR so I had a go at a non-RBAR and came up with this.

    It is a bit hard to read because of the complexity of the nested functions but works well and has a nice simple plan:

    -- create some test data

    ;WITH data(c1,c2,c3,c4)

    AS

    (

    SELECT 'basketball',NULL,NULL,NULL UNION ALL

    SELECT NULL,'basketball',NULL,NULL UNION ALL

    SELECT NULL,NULL,'basketball',NULL UNION ALL

    SELECT NULL,NULL,NULL,'basketball' UNION ALL

    SELECT 'basketball','tennis',NULL,NULL UNION ALL

    SELECT 'basketball',NULL,'tennis',NULL UNION ALL

    SELECT 'basketball',NULL,NULL,'tennis' UNION ALL

    SELECT NULL,'basketball','tennis',NULL UNION ALL

    SELECT NULL,'basketball',NULL,'tennis' UNION ALL

    SELECT NULL,NULL,'basketball','tennis' UNION ALL

    SELECT 'basketball','tennis','cricket',NULL UNION ALL

    SELECT 'basketball','tennis',NULL,'cricket' UNION ALL

    SELECT 'basketball',NULL,'tennis','cricket' UNION ALL

    SELECT NULL,'basketball','tennis','cricket' UNION ALL

    SELECT 'basketball','tennis','cricket','curling'

    )

    SELECT d1.c1,d1.c2,d1.c3,d1.c4

    INTO #temp

    FROM data d1

    -- now transform it by moving all nulls to the end without RBAR

    SELECT c1toc4 AS newc1,

    newc2,

    newc3,

    newc4,

    c1,

    c2,

    c3,

    c4

    FROM #temp

    -- first CROSS APPLY is just to provide shortcuts for the various COALESCE statements

    CROSS APPLY (

    SELECT c1toc4 = COALESCE(c1,c2,c3,c4) ,

    c2toc4 = COALESCE(c2,c3,c4) ,

    c3toc4 = COALESCE(c3,c4)

    ) cx

    -- second CROSS APPLY figures out what goes in the second column

    -- we know the 1st column is always c1toc4 - i.e. the first non-null

    -- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4

    -- if it matches either or those we need to either take column 4 or null (if col4 matches c1toc4 we have already used it for new column 1)

    CROSS APPLY (

    SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,c1toc4),c3toc4),c1toc4),NULLIF(c4,c1toc4))

    ) cx2

    -- and this is for the third column

    CROSS APPLY (

    SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4,newc2),c1toc4),NULLIF(NULLIF(c4,newc2),c1toc4))

    ) cx3

    -- and this is for the fourth column

    CROSS APPLY (

    SELECT newc4 = NULLIF(NULLIF(NULLIF(c4,newc3),newc2),c1toc4)

    ) cx4

    I hope you like it and don't want even more columns!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (7/6/2011)


    Hi,

    I didn't like all the RBAR so I had a go at a non-RBAR and came up with this.

    It is a bit hard to read because of the complexity of the nested functions but works well and has a nice simple plan:

    -- create some test data

    ;WITH data(c1,c2,c3,c4)

    AS

    (

    SELECT 'basketball',NULL,NULL,NULL UNION ALL

    SELECT NULL,'basketball',NULL,NULL UNION ALL

    SELECT NULL,NULL,'basketball',NULL UNION ALL

    SELECT NULL,NULL,NULL,'basketball' UNION ALL

    SELECT 'basketball','tennis',NULL,NULL UNION ALL

    SELECT 'basketball',NULL,'tennis',NULL UNION ALL

    SELECT 'basketball',NULL,NULL,'tennis' UNION ALL

    SELECT NULL,'basketball','tennis',NULL UNION ALL

    SELECT NULL,'basketball',NULL,'tennis' UNION ALL

    SELECT NULL,NULL,'basketball','tennis' UNION ALL

    SELECT 'basketball','tennis','cricket',NULL UNION ALL

    SELECT 'basketball','tennis',NULL,'cricket' UNION ALL

    SELECT 'basketball',NULL,'tennis','cricket' UNION ALL

    SELECT NULL,'basketball','tennis','cricket' UNION ALL

    SELECT 'basketball','tennis','cricket','curling'

    )

    SELECT d1.c1,d1.c2,d1.c3,d1.c4

    INTO #temp

    FROM data d1

    -- now transform it by moving all nulls to the end without RBAR

    SELECT c1toc4 AS newc1,

    newc2,

    newc3,

    newc4,

    c1,

    c2,

    c3,

    c4

    FROM #temp

    -- first CROSS APPLY is just to provide shortcuts for the various COALESCE statements

    CROSS APPLY (

    SELECT c1toc4 = COALESCE(c1,c2,c3,c4) ,

    c2toc4 = COALESCE(c2,c3,c4) ,

    c3toc4 = COALESCE(c3,c4)

    ) cx

    -- second CROSS APPLY figures out what goes in the second column

    -- we know the 1st column is always c1toc4 - i.e. the first non-null

    -- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4

    -- if it matches either or those we need to either take column 4 or null (if col4 matches c1toc4 we have already used it for new column 1)

    CROSS APPLY (

    SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,c1toc4),c3toc4),c1toc4),NULLIF(c4,c1toc4))

    ) cx2

    -- and this is for the third column

    CROSS APPLY (

    SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4,newc2),c1toc4),NULLIF(NULLIF(c4,newc2),c1toc4))

    ) cx3

    -- and this is for the fourth column

    CROSS APPLY (

    SELECT newc4 = NULLIF(NULLIF(NULLIF(c4,newc3),newc2),c1toc4)

    ) cx4

    I hope you like it and don't want even more columns!

    Using your excellent testdata i came up with

    select

    SubString(s, 1, 20),

    SubString(s, 21, 20),

    SubString(s, 41, 20),

    SubString(s, 61, 20)

    from (select

    IsNull(c1 + Space(20-Len(c1)), '') + IsNull(c2 + Space(20-Len(c2)), '') +

    IsNull(c3 + Space(20-Len(c3)), '') + IsNull(c4 + Space(20-Len(c4)), '') s

    from #temp) t

    Which would atleast make it fairly simple to add more columns to it.

    /T

  • tommyh (7/7/2011)


    Using your excellent testdata i came up with

    select

    SubString(s, 1, 20),

    SubString(s, 21, 20),

    SubString(s, 41, 20),

    SubString(s, 61, 20)

    from (select

    IsNull(c1 + Space(20-Len(c1)), '') + IsNull(c2 + Space(20-Len(c2)), '') +

    IsNull(c3 + Space(20-Len(c3)), '') + IsNull(c4 + Space(20-Len(c4)), '') s

    from #temp) t

    Which would atleast make it fairly simple to add more columns to it.

    /T

    Gosh! I feel so stupid now - nice lateral thinking!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

    Viewing 15 posts - 1 through 15 (of 92 total)

    You must be logged in to reply to this topic. Login to reply