Handle NULLs

  • Paul White

    SSC Guru

    Points: 150442

    mister.magoo (7/8/2011)


    My latest submission is based on the string manipulation, but simplified... on my QUAD core PC I like the results, but let's see how it does on yours.

    Very creative. Shame about this: SET CONCAT_NULL_YIELDS_NULL OFF;

    In a future version of SQL Server CONCAT_NULL_YIELDS_NULL will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

  • The Dixie Flatline

    SSC Guru

    Points: 53253

    How should I format my case solution to get in on the testing?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    The Dixie Flatline (7/8/2011)


    Fun problem. 😀

    This solution might be a little less CPU intensive because the only string manipulation is done with ISNULL or CASE tests.

    The trick is to squeeze out the nulls from right-to-left, instead of left-to-right.

    ;with

    cte1 as (select C1,C2,ISNULL(C3,C4) as C3

    , CASE WHEN C3 IS NULL THEN NULL ELSE C4 END as C4 from #temp)

    ,cte2 as (select C1, ISNULL(C2,C3) as C2

    , CASE WHEN C2 IS NULL THEN C4 ELSE C3 end as C3

    , CASE WHEN C2 IS NULL THEN NULL ELSE C4 END as C4 from cte1)

    ,cte3 as (select ISNULL(C1,C2) as C1

    , CASE WHEN C1 IS NULL then C3 ELSE C2 end as C2

    , CASE WHEN C1 IS NULL THEN C4 ELSE C3 end as C3

    , CASE WHEN C1 IS NULL THEN NULL ELSE C4 END as C4 from cte2)

    select * from cte3

    I just gave them a once-over but the result set looks correct. Blazing fast!

    The entire set of code with The Dixie Flatline solution modified to work with the concrete test tables:

    SET STATISTICS TIME OFF ;

    GO

    SET NOCOUNT ON ;

    GO

    PRINT REPLICATE('-', 75) + '

    build temp tables'

    SELECT 'build temp tables'

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'dbo.hobby1')

    AND type IN ( N'U' ) )

    DROP TABLE dbo.hobby1 ;

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'dbo.hobby2')

    AND type IN ( N'U' ) )

    DROP TABLE dbo.hobby2 ;

    CREATE TABLE dbo.hobby1

    (

    id INT IDENTITY(1, 1) NOT NULL,

    hobby1 VARCHAR(36),

    hobby2 VARCHAR(36)

    ) ;

    CREATE TABLE dbo.hobby2

    (

    id INT IDENTITY(1, 1) NOT NULL,

    hobby1 VARCHAR(36),

    hobby2 VARCHAR(36)

    ) ;

    GO

    --===== Add PK's to both tables

    ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

    ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

    GO

    --=====================================================================================================================

    -- Create and populate the test tables. This is NOT a part of the solution.

    --=====================================================================================================================

    --===== Declare and preset some obviously named local variables

    DECLARE @RowsInFirstTable INT,

    @RowsInSecondTable INT ;

    SELECT @RowsInFirstTable = 200000,

    @RowsInSecondTable = 200000 ;

    IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL

    DROP TABLE #HobbyName;

    --===== Create a list of HobbyNames so we can randomly populate the test tables.

    -- The NULL's are simply part of the possible things that could be a hobby.

    SELECT HobbyNameID = IDENTITY( INT,1,1),

    HobbyName = CAST(d.HobbyName AS VARCHAR(15))

    INTO #HobbyName

    FROM (

    SELECT NULL

    UNION ALL

    SELECT NULL

    UNION ALL

    SELECT 'BaseBall'

    UNION ALL

    SELECT 'Basketball'

    UNION ALL

    SELECT 'Tennis'

    UNION ALL

    SELECT 'FootBall'

    UNION ALL

    SELECT 'Hockey'

    UNION ALL

    SELECT 'Rugby'

    UNION ALL

    SELECT 'Golf'

    UNION ALL

    SELECT 'Bowling'

    ) d ( HobbyName ) ;

    ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;

    --===== Create the "first" table and populate it on the fly

    INSERT INTO dbo.hobby1

    (

    hobby1,

    hobby2

    )

    SELECT TOP ( @RowsInFirstTable )

    Hobby1 = (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    ),

    Hobby2 = (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    )

    FROM sys.all_columns ac1,

    sys.all_columns ac2 ;

    --===== Create the "second" table and populate it on the fly

    INSERT INTO dbo.hobby2

    (

    hobby1,

    hobby2

    )

    SELECT TOP ( @RowsInSecondTable )

    (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    ),

    (

    SELECT TOP 1

    HobbyName

    FROM #HobbyName

    WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

    )

    FROM sys.all_columns ac1,

    sys.all_columns ac2 ;

    GO

    ------------------------------------------------------------------------------------

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    String manipulation (dups, ordered by column)'

    SELECT 'String manipulation (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    SELECT t.id,

    RTRIM(SUBSTRING(t.s, 1, 36)) AS h1,

    RTRIM(SUBSTRING(t.s, 37, 36)) AS h2,

    RTRIM(SUBSTRING(t.s, 73, 36)) AS h3,

    RTRIM(SUBSTRING(t.s, 109, 36)) AS h4

    --INTO #dummy

    FROM (

    SELECT h1.id,

    ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

    ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

    - LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

    + SPACE(36- LEN(h2.hobby2)),'') s

    FROM dbo.hobby1 h1

    LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id

    ) t ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    Unpivot/Pivot (dups, ordered by column)'

    SELECT 'Unpivot/Pivot (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    WITH OneTableCTE

    AS (

    SELECT a.id,

    a.hobby1 AS 'h1',

    a.hobby2 AS 'h2',

    b.hobby1 AS 'h3',

    b.hobby2 AS 'h4'

    FROM hobby1 a

    LEFT JOIN hobby2 b ON a.id = b.id

    ),

    UnPivotCTE

    AS (

    SELECT id,

    hobby,

    ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

    AS fixedorder

    FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

    AS Normalized

    )

    SELECT OneTableCTE.id,

    [1],

    [2],

    [3],

    [4]

    --INTO #dummy

    FROM (

    SELECT id,

    hobby,

    fixedorder

    FROM UnPivotCTE

    ) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

    AS PivotTable

    RIGHT JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    Union All/Cross tab (dups, ordered by column)'

    SELECT 'Union All/Cross tab (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    WITH cte(id, hobby, hobby_num)

    AS (

    SELECT h.id,

    h.hobby1,

    1

    FROM dbo.hobby1 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby2,

    2

    FROM dbo.hobby1 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby1,

    3

    FROM dbo.hobby2 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby1

    WHERE id = h.id )

    UNION ALL

    SELECT h.id,

    h.hobby2,

    4

    FROM dbo.hobby2 h

    WHERE EXISTS ( SELECT *

    FROM dbo.hobby2

    WHERE id = h.id )

    ),

    cte2(id, hobby, row_num)

    AS (

    SELECT id,

    hobby,

    ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

    FROM cte

    WHERE hobby IS NOT NULL

    )

    SELECT id,

    MAX(CASE WHEN row_num = 1 THEN hobby

    END) AS hobby1,

    MAX(CASE WHEN row_num = 2 THEN hobby

    END) AS hobby2,

    MAX(CASE WHEN row_num = 3 THEN hobby

    END) AS hobby3,

    MAX(CASE WHEN row_num = 4 THEN hobby

    END) AS hobby4

    --INTO #dummy

    FROM cte2

    GROUP BY id ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    Parsename (dups, ordered by column)'

    SELECT 'Parsename (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    SELECT h1.id,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

    REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

    CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

    CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

    CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

    CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

    --INTO #dummy

    FROM dbo.hobby1 h1

    LEFT JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy;

    GO

    PRINT REPLICATE('-', 75) + '

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'

    SELECT 'String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups'

    SET STATISTICS TIME ON ;

    SELECT

    t.id,

    oa.h1,

    oa.h2,

    oa.h3,

    oa.h4

    --INTO #dummy

    FROM

    (

    SELECT

    h.id,

    h1h1 = h.hobby1,

    h1h2 = h.hobby2,

    h2h1 = h2.hobby1,

    h2h2 = h2.hobby2

    FROM dbo.hobby1 AS h

    JOIN dbo.hobby2 AS h2 ON

    h2.id = h.id

    ) AS x

    CROSS APPLY

    (

    SELECT

    x.id,

    s =

    ISNULL(x.h1h1 + SPACE(36 - LEN(x.h1h1)), SPACE(0)) +

    ISNULL(x.h1h2 + SPACE(36 - LEN(x.h1h2)), SPACE(0)) +

    ISNULL(x.h2h1 + SPACE(36 - LEN(x.h2h1)), SPACE(0)) +

    ISNULL(x.h2h2 + SPACE(36 - LEN(x.h2h2)), SPACE(0))

    ) AS t

    OUTER APPLY

    (

    SELECT

    h1 = RTRIM(SUBSTRING(t.s, 1, 36)),

    h2 = RTRIM(SUBSTRING(t.s, 37, 36)),

    h3 = RTRIM(SUBSTRING(t.s, 73, 36)),

    h4 = RTRIM(SUBSTRING(t.s, 109, 36))

    ) AS oa

    OPTION (HASH JOIN, LOOP JOIN);

    GO

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    IF OBJECT_ID(N'tempdb..#dummy') > 0

    DROP TABLE #dummy ;

    GO

    PRINT REPLICATE('-', 75) + '

    CTEs - The Dixie Flatline (dups, ordered by column)'

    SELECT 'CTEs - The Dixie Flatline (dups, ordered by column)'

    SET STATISTICS TIME ON ;

    WITH cte1

    AS (

    SELECT h1.id,

    h1.hobby1 AS C1,

    h1.hobby2 AS C2,

    ISNULL(h2.hobby1, h2.hobby2) AS C3,

    CASE WHEN h2.hobby1 IS NULL THEN NULL

    ELSE h2.hobby2

    END AS C4

    FROM dbo.hobby1 h1

    JOIN dbo.hobby2 h2 ON h1.id = h2.id

    ),

    cte2

    AS (

    SELECT id,

    C1,

    ISNULL(C2, C3) AS C2,

    CASE WHEN C2 IS NULL THEN C4

    ELSE C3

    END AS C3,

    CASE WHEN C2 IS NULL THEN NULL

    ELSE C4

    END AS C4

    FROM cte1

    ),

    cte3

    AS (

    SELECT id,

    ISNULL(C1, C2) AS C1,

    CASE WHEN C1 IS NULL THEN C3

    ELSE C2

    END AS C2,

    CASE WHEN C1 IS NULL THEN C4

    ELSE C3

    END AS C3,

    CASE WHEN C1 IS NULL THEN NULL

    ELSE C4

    END AS C4

    FROM cte2

    )

    SELECT *

    --INTO #dummy

    FROM cte3 ;

    SET STATISTICS TIME OFF ;

    GO

    ------------------------------------------------------------------------------------

    Results on a dual-core machine:

    ---------------------------------------------------------------------------

    build temp tables

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1716 ms, elapsed time = 4077 ms.

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2916 ms, elapsed time = 4727 ms.

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 1700 ms, elapsed time = 3982 ms.

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2683 ms, elapsed time = 4183 ms.

    ---------------------------------------------------------------------------

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups

    SQL Server Execution Times:

    CPU time = 2449 ms, elapsed time = 4031 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    ---------------------------------------------------------------------------

    CTEs - The Dixie Flatline (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 561 ms, elapsed time = 3903 ms.

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Jeff Moden

    SSC Guru

    Points: 996452

    opc.three (7/9/2011)


    The entire set of code with The Dixie Flatline solution modified to work with the concrete test tables:

    I have to ask... is there a reason why you avoid the use of SELECT/INTO?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Forum FAQ

  • Jeff Moden

    SSC Guru

    Points: 996452

    tfifield (7/8/2011)


    Jeff Moden (7/8/2011)


    Nevyn (7/8/2011)


    Cool stuff, Jeff!

    So do you ever use PIVOT and/or UNPIVOT, or are workarounds almost always better/faster?

    I don't use PIVOT because Cross-Tabs are (usually) both faster and easier to read especially if multiple values need to be aggregated and then pivoted. Since Paul taught me the neat trick with Cross-Apply, I feel the same way about UNPIVOT even though the differences in performance are less pronounced. Heh... I also try to avoid words like "always" and "never". 😀

    Jeff,

    Is there a link to Paul's CROSS APPLY method for UNPIVOT? I'd love to see it.

    Todd Fifield

    I don't have one for UNPIVOT but here's one I have on PIVOT...

    http://www.sqlservercentral.com/articles/T-SQL/63681/

    "Pre-aggregated Cross Tabs" pretty much blows the doors off of PIVOT. Regular "Cross Tabs" don't do so bad, either. Paul's explanation tells why.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Forum FAQ

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    Jeff Moden (7/9/2011)


    opc.three (7/9/2011)


    The entire set of code with The Dixie Flatline solution modified to work with the concrete test tables:

    I have to ask... is there a reason why you avoid the use of SELECT/INTO?

    Old habits die hard I guess 😀

    I was on a thread recently where Craig showed that SELECT...INTO was faster for creating heaps in tempdb but that creating a table with a clustered index in place and then doing the insert was faster for creating clustered tables.The tests were not exhaustive but he concluded he would start creating his temp tables ahead of time if a clustered index was needed.

    In my earlier posts on this thread I was not using a clustered index on the hobby1 and hobby2 tables when I would have been better served using SELECT...INTO to create them. However when you introduced your test data routine and included a clustered index on each table that's when the practice of creating the tables ahead of time and using INSERT...INTO became more appropriate.

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Jeff Moden

    SSC Guru

    Points: 996452

    opc.three (7/10/2011)


    However when you introduced your test data routine and included a clustered index on each table that's when the practice of creating the tables ahead of time and using INSERT...INTO became more appropriate.

    Heh.. I won't give that an "It Depends". I guess I'll give that a "Whatever works" at this point but only because it's test data and not production data. It's more important that someone is actually using some representative test data. I'll even (almost) tolerate someone making a loop to build the data so long as they're making the effort to build more than the usual small handful of rows. 🙂

    I use the SELECT/INTO on test data only because if a data requirement changes as I'm writing code, I don't have to change both the data and the test table... the SELECT/INTO changes it for me. Other than that, "It Depends" comes into play for production building of large tables and it's sometimes very well worth testing both as Craig did.

    As a side bar, when I get a little time, I'll have to try some general testing like Craig did (I remember the post but not sure I can find it easily). I always like to double check.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Forum FAQ

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    Craig's testing showed it for a single-column narrow PK (INT). I wondered whether wider clustering keys would have SELECT...INTO start outperforming the CREATE/INSERT method, and while I did some basic tests using 3-column INT keys showing CREATE/INSERT was still faster, I did not wonder enough to do any exhaustive testing including character-based columns etc. so I go with Craig's conclusion 😀

    From the post http://www.sqlservercentral.com/Forums/FindPost1130873.aspx:

    -- Skipping the gory details, the fully pre-built STILL wins.

    -- Even though it takes on average about 40ms longer to insert all the data,

    -- the subsequent clustered index build takes about 180ms.

    -- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Revenant

    SSC-Forever

    Points: 42467

    opc.three (7/11/2011)


    Craig's testing showed it for a single-column narrow PK (INT). I wondered whether wider clustering keys would have SELECT...INTO start outperforming the CREATE/INSERT method, and while I did some basic tests using 3-column INT keys showing CREATE/INSERT was still faster, I did not wonder enough to do any exhaustive testing including character-based columns etc. so I go with Craig's conclusion 😀

    From the post http://www.sqlservercentral.com/Forums/FindPost1130873.aspx:

    -- Skipping the gory details, the fully pre-built STILL wins.

    -- Even though it takes on average about 40ms longer to insert all the data,

    -- the subsequent clustered index build takes about 180ms.

    -- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.

    These results will vary with the flavor of SQLS and with the family of processors it runs on (Sandy Bridge vs. Nehalem, typically).

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    Revenant (7/11/2011)


    opc.three (7/11/2011)


    Craig's testing showed it for a single-column narrow PK (INT). I wondered whether wider clustering keys would have SELECT...INTO start outperforming the CREATE/INSERT method, and while I did some basic tests using 3-column INT keys showing CREATE/INSERT was still faster, I did not wonder enough to do any exhaustive testing including character-based columns etc. so I go with Craig's conclusion 😀

    From the post http://www.sqlservercentral.com/Forums/FindPost1130873.aspx:

    -- Skipping the gory details, the fully pre-built STILL wins.

    -- Even though it takes on average about 40ms longer to insert all the data,

    -- the subsequent clustered index build takes about 180ms.

    -- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.

    These results will vary with the flavor of SQLS and with the family of processors it runs on (Sandy Bridge vs. Nehalem, typically).

    :unsure: How would CPU architecture figure in? Got stats?

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Revenant

    SSC-Forever

    Points: 42467

    opc.three (7/11/2011)


    . . . :unsure: How would CPU architecture figure in? Got stats?

    Yes, I do have hard measurements. I did comparisons between Nehalem and Sandy Bridge, and at a nominally same clock speed Sandy Bridge is 10 or more percent faster.

    As the study was done on client's penny, I have to ask my boss for (formal) permission to release the original data. I will ask him on Wednesday during our weekly one-on-one and mail it to your private address.

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    Revenant (7/11/2011)


    opc.three (7/11/2011)


    . . . :unsure: How would CPU architecture figure in? Got stats?

    Yes, I do have hard measurements. I did comparisons between Nehalem and Sandy Bridge, and at a nominally same clock speed Sandy Bridge is 10 or more percent faster.

    Thanks...but 10% better at what? I assume you mean SELECT...INTO as to contradict Craig's and my conclusions?

    So does this mean we're back to "It Depends" 😛

    As the study was done on client's penny, I have to ask my boss for (formal) permission to release the original data. I will ask him on Wednesday during our weekly one-on-one and mail it to your private address.

    Any insight into "How would CPU architecture figure in?"? Cache possibly?

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Revenant

    SSC-Forever

    Points: 42467

    opc.three (7/11/2011)


    Any insight into "How would CPU architecture figure in?"? Cache possibly?

    Sandy Bridge has on-the-chip support for parallelization and is able to share load more efficiently, without being told how to go about it.

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    Thanks a lot Jeff...another thread completely derailed!

    Anyone have thoughts on The Dixie Flatline's fine solution? Hopefully conclusions won't depend on a video GPU or sound card or anything like that :hehe:

    Revenant (7/11/2011)


    opc.three (7/11/2011)


    Any insight into "How would CPU architecture figure in?"? Cache possibly?

    Sandy Bridge has on-the-chip support for parallelization and is able to share load more efficiently, without being told how to go about it.

    I am being dense again Revenant...sorry. I would say it's because it's Monday morning but that would be wishful thinking on my part in trying to isolate it to one morning of one day of the week.

    Can you please clarify your comparisons?

    You said:

    I did comparisons between Nehalem and Sandy Bridge, and at a nominally same clock speed Sandy Bridge is 10 or more percent faster.

    Are you saying that SELECT...INTO is 10% faster on Sandy Bridge than CREATE TABLE + INSERT INTO on Nehalem?

    Or is it that SELECT...INTO on Sandy Bridge 10% faster than SELECT...INTO on Nehalem?

    Given that you opened the discussion, hopefully clarifying your conclusions will not violate anything regarding your client's penny.

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Paul White

    SSC Guru

    Points: 150442

    opc.three (7/11/2011)


    Craig's testing showed it for a single-column narrow PK (INT). I wondered whether wider clustering keys would have SELECT...INTO start outperforming the CREATE/INSERT method, and while I did some basic tests using 3-column INT keys showing CREATE/INSERT was still faster, I did not wonder enough to do any exhaustive testing including character-based columns etc. so I go with Craig's conclusion 😀

    From the post http://www.sqlservercentral.com/Forums/FindPost1130873.aspx

    I added my opinion to that thread.

Viewing 15 posts - 76 through 90 (of 93 total)

You must be logged in to reply to this topic. Login to reply