Handle NULLs

  • opc.three (7/8/2011)


    mikes84 (7/8/2011)


    Regarding these two requirements mikes84:

    1) Do you want duplicates shown in the results?

    2) Does order matter?

    Using our two tables hobby1 and hobby2 here is an example that should answer both questions:

    If...

    hobby1.hobby1 = tennis

    hobby1.hobby2 = cricket

    hobby2.hobby1 is null

    hobby2.hobby2 = tennis

    ...what should the result set look like?

    1) I don't want duplicates in the results

    2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2

    If any hobbies are null, the next non-null hobby should be in its place.

    In the case of the sample you provided, it would be

    tennis, cricket, tennis, null

    I can't remember if your XML-based solution met both of these requirements.

    Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?

    From your description I would think you wanted:

    tennis, cricket, null, null

    Please confirm.

    PS I forgot to mention...you can ditch my XML solution for this problem case. I only threw it out there because I thought that making use of the // notation in XQuery was an interesting way to simulate the T-SQL UNION ALL but it would take someone with a lot more knowledge about XQuery than I to make that method viable...that thing will chew through CPU cycles as if it were it's only job in life the way it sits now 😀

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • tommyh (7/8/2011)


    Modding my code so it should do what you now want.

    SELECT t.id,

    RTRIM(SUBSTRING(t.s, 1, 36)),

    RTRIM(SUBSTRING(t.s, 37, 36)),

    RTRIM(SUBSTRING(t.s, 73, 36)),

    RTRIM(SUBSTRING(t.s, 109, 36))

    FROM (

    SELECT h1.id,

    ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

    case when h1.hobby2 in (h1.hobby1) then '' else ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') end +

    case when h2.hobby1 in (h1.hobby1, h1.hobby2) then '' else ISNULL(h2.hobby1 + SPACE(36 - LEN(h2.hobby1)),'') end +

    case when h2.hobby2 in (h1.hobby1, h1.hobby2, h2.hobby1) then '' else ISNULL(h2.hobby2 + SPACE(36 - LEN(h2.hobby2)),'') end s

    FROM dbo.hobby1 h1

    JOIN dbo.hobby2 h2 ON h1.id = h2.id

    ) t ;

    Think it works atleast 😀

    /T

    Thanks, tommy.

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • opc.three (7/8/2011)


    mikes84 (7/8/2011)


    Regarding these two requirements mikes84:

    1) Do you want duplicates shown in the results?

    2) Does order matter?

    Using our two tables hobby1 and hobby2 here is an example that should answer both questions:

    If...

    hobby1.hobby1 = tennis

    hobby1.hobby2 = cricket

    hobby2.hobby1 is null

    hobby2.hobby2 = tennis

    ...what should the result set look like?

    1) I don't want duplicates in the results

    2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2

    If any hobbies are null, the next non-null hobby should be in its place.

    In the case of the sample you provided, it would be

    tennis, cricket, tennis, null

    I can't remember if your XML-based solution met both of these requirements.

    Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?

    From your description I would think you wanted:

    tennis, cricket, null, null

    Please confirm.

    I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • mikes84 (7/8/2011)


    ...truncated quotes of quotes, etc....

    I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.

    Thanks for confirming. If you said you did not want dups I was going to ask why you provided the PARSENAME solution 😀

    Here are the solutions presented so far that would get you there:

    SET STATISTICS TIME OFF ;

    GO

    SET NOCOUNT ON ;

    GO

    PRINT REPLICATE('-', 75) + '

    build temp tables'

    SELECT 'build temp tables'

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'hobby1')

    AND type IN ( N'U' ) )

    DROP TABLE hobby1 ;

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'hobby2')

    AND type IN ( N'U' ) )

    DROP TABLE hobby2 ;

    CREATE TABLE hobby1

    (

    id INT IDENTITY(1, 1) NOT NULL,

    hobby1 VARCHAR(36),

    hobby2 VARCHAR(36)

    ) ;

    CREATE TABLE hobby2

    (

    id INT IDENTITY(1, 1) NOT NULL,

    hobby1 VARCHAR(36),

    hobby2 VARCHAR(36)

    ) ;

    GO

    --===== Add PK's to both tables

    ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

    ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

    GO

    --=====================================================================================================================

    -- Create and populate the test tables. This is NOT a part of the solution.

    --=====================================================================================================================

    --===== Declare and preset some obviously named local variables

    DECLARE @RowsInFirstTable INT,

    @RowsInSecondTable INT ;

    SELECT @RowsInFirstTable = 200000,

    @RowsInSecondTable = 200000 ;

    IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL

    DROP TABLE #HobbyName;

    --===== Create a list of HobbyNames so we can randomly populate the test tables.

    -- The NULL's are simply part of the possible things that could be a hobby.

    SELECT HobbyNameID = IDENTITY( INT,1,1),

    HobbyName = CAST(d.HobbyName AS VARCHAR(15))

    INTO #HobbyName

    FROM (

    SELECT NULL

    UNION ALL

    SELECT NULL

    UNION ALL

    SELECT 'BaseBall'

    UNION ALL

    SELECT 'Basketball'

    UNION ALL

    SELECT 'Tennis'

    UNION ALL

    SELECT 'FootBall'

    UNION ALL

    SELECT 'Hockey'

    UNION ALL

    SELECT 'Rugby'

    UNION ALL

    SELECT 'Golf'

    UNION ALL

    SELECT 'Bowling'

    ) d ( HobbyName ) ;

    ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;

    --===== Create the "first" table and populate it on the fly

    INSERT INTO dbo.hobby1

    (

    hobby1,

    hobby2

    )

    SELECT TOP ( @RowsInFirstTable )

    Hobby1 = (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    ),

    Hobby2 = (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    )

    FROM sys.all_columns ac1,

    sys.all_columns ac2 ;

    --===== Create the "second" table and populate it on the fly

    INSERT INTO dbo.hobby2

    (

    hobby1,

    hobby2

    )

    SELECT TOP ( @RowsInSecondTable )

    (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    ),

    (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    )

    FROM sys.all_columns ac1,

    sys.all_columns ac2 ;

    GO

    ------------------------------------------------------------------------------------

    PRINT REPLICATE('-', 75) + '

    String manipulation (dups, ordered by column)'

    SELECT 'String manipulation (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    SELECT t.id,

    RTRIM(SUBSTRING(t.s, 1, 36)),

    RTRIM(SUBSTRING(t.s, 37, 36)),

    RTRIM(SUBSTRING(t.s, 73, 36)),

    RTRIM(SUBSTRING(t.s, 109, 36))

    FROM (

    SELECT h1.id,

    ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

    ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

    - LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

    + SPACE(36- LEN(h2.hobby2)),'') s

    FROM dbo.hobby1 h1

    JOIN dbo.hobby2 h2 ON h1.id = h2.id

    ) t ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    PRINT REPLICATE('-', 75) + '

    Unpivot/Pivot (dups, ordered by column)'

    SELECT 'Unpivot/Pivot (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    WITH OneTableCTE

    AS (

    SELECT a.id,

    a.hobby1 AS 'h1',

    a.hobby2 AS 'h2',

    b.hobby1 AS 'h3',

    b.hobby2 AS 'h4'

    FROM hobby1 a

    INNER JOIN hobby2 b ON a.id = b.id

    ),

    UnPivotCTE

    AS (

    SELECT id,

    hobby,

    ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

    AS fixedorder

    FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

    AS Normalized

    )

    SELECT OneTableCTE.id,

    [1],

    [2],

    [3],

    [4]

    FROM (

    SELECT id,

    hobby,

    fixedorder

    FROM UnPivotCTE

    ) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

    AS PivotTable

    RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    PRINT REPLICATE('-', 75) + '

    Union All/Cross tab (dups, ordered by column)'

    SELECT 'Union All/Cross tab (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    WITH cte(id, hobby, hobby_num)

    AS (

    SELECT h.id,

    h.hobby1,

    1

    FROM dbo.hobby1 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby2,

    2

    FROM dbo.hobby1 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby1,

    3

    FROM dbo.hobby2 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby1

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby2,

    4

    FROM dbo.hobby2 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    ),

    cte2(id, hobby, row_num)

    AS (

    SELECT id,

    hobby,

    ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

    FROM cte

    WHERE hobby IS NOT NULL

    )

    SELECT id,

    MAX(CASE WHEN row_num = 1 THEN hobby

    END) AS hobby1,

    MAX(CASE WHEN row_num = 2 THEN hobby

    END) AS hobby2,

    MAX(CASE WHEN row_num = 3 THEN hobby

    END) AS hobby3,

    MAX(CASE WHEN row_num = 4 THEN hobby

    END) AS hobby4

    FROM cte2

    GROUP BY id ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    PRINT REPLICATE('-', 75) + '

    Parsename (dups, ordered by column)'

    SELECT 'Parsename (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    SELECT h1.id,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

    FROM dbo.hobby1 h1

    JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

    SET STATISTICS TIME OFF ;

    GO

    And some typical results on my machine for 100K rows:

    ---------------------------------------------------------------------------

    build temp tables

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 719 ms, elapsed time = 1717 ms.

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1188 ms, elapsed time = 1601 ms.

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 546 ms, elapsed time = 1571 ms.

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1172 ms, elapsed time = 1507 ms.

    And 200K:

    ---------------------------------------------------------------------------

    build temp tables

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1625 ms, elapsed time = 2917 ms.

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2189 ms, elapsed time = 3343 ms.

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 1141 ms, elapsed time = 2932 ms.

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2109 ms, elapsed time = 2973 ms.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.

    Do you want to show 4 nulls if there is a record but no hobbies?

    If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).

    One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.

  • Nevyn (7/8/2011)


    I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.

    Do you want to show 4 nulls if there is a record but no hobbies?

    If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).

    One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.

    I did not see a measurable performance difference when changing the JOIN type in the "Unpivot/Pivot". You can toggle whether to return rows with all NULL hobbies in the "Union All/Crosstab" by commenting out "WHERE hobby IS NOT NULL" in cte2. I did not see any noticeable difference in performance when toggling that option either.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc, thanks for laying all of that out.

    Nevyn, yes, I'd like to see all NULLs if wherever there aren't hobbies (even if there's 0/4 hobbies populated I'd like to see 4 NULLs).

    Good catch on the other possibility -- however, there will never be a record in one table and not in the other. At the very least, there will be a record with 2 NULL fields in each table.

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • It strikes me that the elapsed time of the tests is limited by the time it takes to display the results - consider using SELECT INTO to write the results to a temporary table.

    For fun, I played with the string solution a bit to generate a parallel plan. On my machine, the following code runs in 700ms for 200,000 rows (further improvements may be possible, I didn't touch Tommy's core algorithm).

    SELECT

    t.id,

    oa.h1,

    oa.h2,

    oa.h3,

    oa.h4

    INTO #dummy

    FROM

    (

    SELECT

    h.id,

    h1h1 = h.hobby1,

    h1h2 = h.hobby2,

    h2h1 = h2.hobby1,

    h2h2 = h2.hobby2

    FROM dbo.hobby1 AS h

    JOIN dbo.hobby2 AS h2 ON

    h2.id = h.id

    ) AS x

    CROSS APPLY

    (

    SELECT

    x.id,

    s =

    ISNULL(x.h1h1 + SPACE(36 - LEN(x.h1h1)), SPACE(0)) +

    CASE

    WHEN x.h1h2 = x.h1h1 THEN SPACE(0)

    ELSE ISNULL(x.h1h2 + SPACE(36 - LEN(x.h1h2)), SPACE(0))

    END +

    CASE

    WHEN x.h2h1 IN (x.h1h1, x.h1h2) THEN SPACE(0)

    ELSE ISNULL(x.h2h1 + SPACE(36 - LEN(x.h2h1)), SPACE(0))

    END +

    CASE

    WHEN x.h2h2 IN (x.h1h1, x.h1h2, x.h2h1) THEN SPACE(0)

    ELSE ISNULL(x.h2h2 + SPACE(36 - LEN(x.h2h2)), SPACE(0))

    END

    ) AS t

    OUTER APPLY

    (

    SELECT

    h1 = RTRIM(SUBSTRING(t.s, 1, 36)),

    h2 = RTRIM(SUBSTRING(t.s, 37, 36)),

    h3 = RTRIM(SUBSTRING(t.s, 73, 36)),

    h4 = RTRIM(SUBSTRING(t.s, 109, 36))

    ) AS oa

    OPTION (HASH JOIN, LOOP JOIN);

  • I love this site 😀

    SQLkiwi (7/8/2011)


    It strikes me that the elapsed time of the tests is limited by the time it takes to display the results -

    That's why I typically ignore elapsed time. CPU clock ticks is what always see considered the key indicator to go by.

    consider using SELECT INTO to write the results to a temporary table.

    For fun, I played with the string solution a bit to generate a parallel plan. On my machine, the following code runs in 700ms for 200,000 rows (further improvements may be possible, I didn't touch Tommy's core algorithm).

    It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments below 😉

    I also added the "INTO #dummy" to each solution and here are the results:

    SET STATISTICS TIME OFF ;

    GO

    SET NOCOUNT ON ;

    GO

    PRINT REPLICATE('-', 75) + '

    build temp tables'

    SELECT 'build temp tables'

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'hobby1')

    AND type IN ( N'U' ) )

    DROP TABLE hobby1 ;

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'hobby2')

    AND type IN ( N'U' ) )

    DROP TABLE hobby2 ;

    CREATE TABLE hobby1

    (

    id INT IDENTITY(1, 1) NOT NULL,

    hobby1 VARCHAR(36),

    hobby2 VARCHAR(36)

    ) ;

    CREATE TABLE hobby2

    (

    id INT IDENTITY(1, 1) NOT NULL,

    hobby1 VARCHAR(36),

    hobby2 VARCHAR(36)

    ) ;

    GO

    --===== Add PK's to both tables

    ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

    ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

    GO

    --=====================================================================================================================

    -- Create and populate the test tables. This is NOT a part of the solution.

    --=====================================================================================================================

    --===== Declare and preset some obviously named local variables

    DECLARE @RowsInFirstTable INT,

    @RowsInSecondTable INT ;

    SELECT @RowsInFirstTable = 200000,

    @RowsInSecondTable = 200000 ;

    IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL

    DROP TABLE #HobbyName;

    --===== Create a list of HobbyNames so we can randomly populate the test tables.

    -- The NULL's are simply part of the possible things that could be a hobby.

    SELECT HobbyNameID = IDENTITY( INT,1,1),

    HobbyName = CAST(d.HobbyName AS VARCHAR(15))

    INTO #HobbyName

    FROM (

    SELECT NULL

    UNION ALL

    SELECT NULL

    UNION ALL

    SELECT 'BaseBall'

    UNION ALL

    SELECT 'Basketball'

    UNION ALL

    SELECT 'Tennis'

    UNION ALL

    SELECT 'FootBall'

    UNION ALL

    SELECT 'Hockey'

    UNION ALL

    SELECT 'Rugby'

    UNION ALL

    SELECT 'Golf'

    UNION ALL

    SELECT 'Bowling'

    ) d ( HobbyName ) ;

    ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;

    --===== Create the "first" table and populate it on the fly

    INSERT INTO dbo.hobby1

    (

    hobby1,

    hobby2

    )

    SELECT TOP ( @RowsInFirstTable )

    Hobby1 = (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    ),

    Hobby2 = (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    )

    FROM sys.all_columns ac1,

    sys.all_columns ac2 ;

    --===== Create the "second" table and populate it on the fly

    INSERT INTO dbo.hobby2

    (

    hobby1,

    hobby2

    )

    SELECT TOP ( @RowsInSecondTable )

    (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    ),

    (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    )

    FROM sys.all_columns ac1,

    sys.all_columns ac2 ;

    GO

    ------------------------------------------------------------------------------------

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    String manipulation (dups, ordered by column)'

    SELECT 'String manipulation (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    SELECT t.id,

    RTRIM(SUBSTRING(t.s, 1, 36)) AS h1,

    RTRIM(SUBSTRING(t.s, 37, 36)) AS h2,

    RTRIM(SUBSTRING(t.s, 73, 36)) AS h3,

    RTRIM(SUBSTRING(t.s, 109, 36)) AS h4

    INTO #dummy

    FROM (

    SELECT h1.id,

    ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

    ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

    - LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

    + SPACE(36- LEN(h2.hobby2)),'') s

    FROM dbo.hobby1 h1

    LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id

    ) t ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    Unpivot/Pivot (dups, ordered by column)'

    SELECT 'Unpivot/Pivot (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    WITH OneTableCTE

    AS (

    SELECT a.id,

    a.hobby1 AS 'h1',

    a.hobby2 AS 'h2',

    b.hobby1 AS 'h3',

    b.hobby2 AS 'h4'

    FROM hobby1 a

    LEFT JOIN hobby2 b ON a.id = b.id

    ),

    UnPivotCTE

    AS (

    SELECT id,

    hobby,

    ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

    AS fixedorder

    FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

    AS Normalized

    )

    SELECT OneTableCTE.id,

    [1],

    [2],

    [3],

    [4]

    INTO #dummy

    FROM (

    SELECT id,

    hobby,

    fixedorder

    FROM UnPivotCTE

    ) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

    AS PivotTable

    RIGHT JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    Union All/Cross tab (dups, ordered by column)'

    SELECT 'Union All/Cross tab (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    WITH cte(id, hobby, hobby_num)

    AS (

    SELECT h.id,

    h.hobby1,

    1

    FROM dbo.hobby1 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby2,

    2

    FROM dbo.hobby1 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby1,

    3

    FROM dbo.hobby2 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby1

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby2,

    4

    FROM dbo.hobby2 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    ),

    cte2(id, hobby, row_num)

    AS (

    SELECT id,

    hobby,

    ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

    FROM cte

    --WHERE hobby IS NOT NULL

    )

    SELECT id,

    MAX(CASE WHEN row_num = 1 THEN hobby

    END) AS hobby1,

    MAX(CASE WHEN row_num = 2 THEN hobby

    END) AS hobby2,

    MAX(CASE WHEN row_num = 3 THEN hobby

    END) AS hobby3,

    MAX(CASE WHEN row_num = 4 THEN hobby

    END) AS hobby4

    INTO #dummy

    FROM cte2

    GROUP BY id ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    Parsename (dups, ordered by column)'

    SELECT 'Parsename (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    SELECT h1.id,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

    INTO #dummy

    FROM dbo.hobby1 h1

    LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'

    SELECT 'String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'

    SET STATISTICS TIME ON ;

    SELECT

    t.id,

    oa.h1,

    oa.h2,

    oa.h3,

    oa.h4

    INTO #dummy

    FROM

    (

    SELECT

    h.id,

    h1h1 = h.hobby1,

    h1h2 = h.hobby2,

    h2h1 = h2.hobby1,

    h2h2 = h2.hobby2

    FROM dbo.hobby1 AS h

    JOIN dbo.hobby2 AS h2 ON

    h2.id = h.id

    ) AS x

    CROSS APPLY

    (

    SELECT

    x.id,

    s =

    ISNULL(x.h1h1 + SPACE(36 - LEN(x.h1h1)), SPACE(0)) +

    ISNULL(x.h1h2 + SPACE(36 - LEN(x.h1h2)), SPACE(0)) +

    ISNULL(x.h2h1 + SPACE(36 - LEN(x.h2h1)), SPACE(0)) +

    ISNULL(x.h2h2 + SPACE(36 - LEN(x.h2h2)), SPACE(0))

    ) AS t

    OUTER APPLY

    (

    SELECT

    h1 = RTRIM(SUBSTRING(t.s, 1, 36)),

    h2 = RTRIM(SUBSTRING(t.s, 37, 36)),

    h3 = RTRIM(SUBSTRING(t.s, 73, 36)),

    h4 = RTRIM(SUBSTRING(t.s, 109, 36))

    ) AS oa

    OPTION (HASH JOIN, LOOP JOIN);

    SET STATISTICS TIME OFF ;

    GO

    Results for 200K on my machine:

    ---------------------------------------------------------------------------

    build temp tables

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1359 ms, elapsed time = 1435 ms.

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2095 ms, elapsed time = 1642 ms.

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1344 ms, elapsed time = 1419 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2000 ms, elapsed time = 2174 ms.

    ---------------------------------------------------------------------------

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups

    SQL Server Execution Times:

    CPU time = 2031 ms, elapsed time = 1132 ms.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (7/8/2011)


    It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments below

    Yeah no worries - I just picked one of the faster routines from earlier, I haven't been following the discussion closely enough to know what the current requirement is - so thanks for the modification. Not sure I deserve much credit on the code (it's Tommy's) I just added parallelism.

    Results for 200K on my machine

    Nice. You need more cores though :laugh:

  • opc.three (7/8/2011)


    Nevyn (7/8/2011)


    I did misunderstand. I thought you were talking about duplicates in terms of records, not hobbies. Duplicate hobbies are allowed.

    Do you want to show 4 nulls if there is a record but no hobbies?

    If not, the outer join can be removed from the PIVOT/UNPIVOT (I'm guessing it wont make a huge speed difference though).

    One other case we haven't discussed, is it possible to have a record in the first table and no record in the second (as opposed to a record with null hobbies)? That requirement would break some of these solutions as they are.

    I did not see a measurable performance difference when changing the JOIN type in the "Unpivot/Pivot". You can toggle whether to return rows with all NULL hobbies in the "Union All/Crosstab" by commenting out "WHERE hobby IS NOT NULL" in cte2. I did not see any noticeable difference in performance when toggling that option either.

    To be clear, its not changing join type, you can take the join right out (it joined back to the first CTE only to catch the all null rows.

    But even then, my guess is no performance boost.

  • SQLkiwi (7/8/2011)


    opc.three (7/8/2011)


    It looks like your solution is removing dups. I modified it to leave them in which should make it a bit more efficient..hopefully I did not mess it up. I gave you full credit for the original code in the comments below

    Yeah no worries - I just picked one of the faster routines from earlier, I haven't been following the discussion closely enough to know what the current requirement is - so thanks for the modification. Not sure I deserve much credit on the code (it's Tommy's) I just added parallelism.

    Results for 200K on my machine

    Nice. You need more cores though :laugh:

    No argument there 😛

    Re: the machine I'm running on...it depends on the day but ATM I'm running everything on a single proc XP machine with HT :sick:

    Would you mind grabbing the whole chunk of code from my last post and posting the results from running it on your machine?

    I just ran it on some multi-core server HW and got this for 200K rows:

    ---------------------------------------------------------------------------

    build temp tables

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2110 ms, elapsed time = 2116 ms.

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 3980 ms, elapsed time = 1306 ms.

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1844 ms, elapsed time = 1840 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 3016 ms, elapsed time = 3012 ms.

    ---------------------------------------------------------------------------

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups

    SQL Server Execution Times:

    CPU time = 2859 ms, elapsed time = 453 ms.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (7/8/2011)


    Would you mind grabbing the whole chunk of code from my last post and posting the results from running it on your machine?

    16-core test machine:

    ---------------------------------------------------------------------------

    build temp tables

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1046 ms, elapsed time = 1138 ms.

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2994 ms, elapsed time = 849 ms.

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1294 ms, elapsed time = 1329 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1716 ms, elapsed time = 1792 ms.

    ---------------------------------------------------------------------------

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups

    SQL Server Execution Times:

    CPU time = 2795 ms, elapsed time = 269 ms.

  • Nevyn (7/8/2011)


    Cool stuff, Jeff!

    So do you ever use PIVOT and/or UNPIVOT, or are workarounds almost always better/faster?

    I don't use PIVOT because Cross-Tabs are (usually) both faster and easier to read especially if multiple values need to be aggregated and then pivoted. Since Paul taught me the neat trick with Cross-Apply, I feel the same way about UNPIVOT even though the differences in performance are less pronounced. Heh... I also try to avoid words like "always" and "never". 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mikes84 (7/8/2011)


    Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?

    I suppose but let me ask... what do you want done with the following data since you also don't want dupes?

    Table1

    1,'BasketBall','Tennis'

    Table2

    1,'Tennis','BasketBall'

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 46 through 60 (of 92 total)

You must be logged in to reply to this topic. Login to reply