Home Forums SQL Server 2005 T-SQL (SS2K5) Primary Key creation - before and after bulk insert RE: Primary Key creation - before and after bulk insert

  • Hey Arjun,

    There are a few important things here that determine the fastest method:

    First, you must avoid an explicit sort operation if possible. In your first script (where the clustered index exists before the load) you are loading the result of a ROW_NUMBER ranking function into an INTEGER column. ROW_NUMBER returns BIGINT; the type mismatch is enough to confuse the optimiser in thinking it needs to sort the results before inserting into the table.

    Second, try to achieve a minimally-logged load. In SQL Server 2005, an INSERT...SELECT statement is always fully logged. The minimally-logged alternative is SELECT...INTO. This does mean creating the clustered index as a second operation, but the benefit of minimal logging more than makes up for this overhead. Just to confuse matters, SQL Server 2008 can do a minimally-logged INSERT...SELECT in many circumstances.

    Finally, try not to use master.dbo.spt_values - use a documented system table instead. The reason is that the internals of spt_values changed significantly in SQL Server 2008, making it a very slow way to grab rows.

    The fastest method (for me) depends on which version of SQL Server I use. The lack of a minimally-logged INSERT in SQL Server 2005 means that SELECT...INTO followed by a clustered index build is faster. The improvements in SQL Server 2008 make INSERT...SELECT (with an existing clustered index) faster.

    So, my 'improved' version of the first query (pre-existing index) is:

    --

    -- Load into an existing index

    -- (fastest on 2008)

    --

    IF OBJECT_ID(N'tempdb..#Tally', N'U')

    IS NOT NULL

    BEGIN

    DROP TABLE #Tally;

    END;

    GO

    DECLARE @StartTime DATETIME;

    SET @StartTime = GETDATE();

    CREATE TABLE #Tally

    (

    N BIGINT NOT NULL PRIMARY KEY CLUSTERED

    );

    INSERT #Tally

    (N)

    SELECT TOP (10000)

    ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM master.sys.allocation_units A,

    master.sys.allocation_units B;

    SELECT [INSERT...SELECT] = DATEDIFF(MILLISECOND, @StartTime, GETDATE());

    GO

    DROP TABLE #Tally;

    'Improved' version of the second query (load then index) is:

    --

    -- Load then create index

    -- (fastest on 2005)

    --

    IF OBJECT_ID(N'tempdb..#Tally', N'U')

    IS NOT NULL

    BEGIN

    DROP TABLE #Tally;

    END;

    GO

    DECLARE @StartTime DATETIME;

    SET @StartTime = GETDATE();

    SELECT TOP (10000)

    ISNULL(ROW_NUMBER() OVER(ORDER BY (SELECT 1)), -1) AS N

    INTO #Tally

    FROM master.sys.allocation_units A,

    master.sys.allocation_units B;

    ALTER TABLE #Tally

    ADD PRIMARY KEY CLUSTERED (N)

    WITH

    FILLFACTOR = 100;

    SELECT [SELECT...INTO] = DATEDIFF(MILLISECOND, @StartTime, GETDATE());

    GO

    DROP TABLE #Tally;

    Typical run times on 2008:

    Query 1: 13ms

    Query 2: 23ms

    Typical run times on 2005:

    Query 1: 96ms

    Query 2: 20ms

    Paul