Hey Arjun,
There are a few important things here that determine the fastest method:
First, you must avoid an explicit sort operation if possible. In your first script (where the clustered index exists before the load) you are loading the result of a ROW_NUMBER ranking function into an INTEGER column. ROW_NUMBER returns BIGINT; the type mismatch is enough to confuse the optimiser in thinking it needs to sort the results before inserting into the table.
Second, try to achieve a minimally-logged load. In SQL Server 2005, an INSERT...SELECT statement is always fully logged. The minimally-logged alternative is SELECT...INTO. This does mean creating the clustered index as a second operation, but the benefit of minimal logging more than makes up for this overhead. Just to confuse matters, SQL Server 2008 can do a minimally-logged INSERT...SELECT in many circumstances.
Finally, try not to use master.dbo.spt_values - use a documented system table instead. The reason is that the internals of spt_values changed significantly in SQL Server 2008, making it a very slow way to grab rows.
The fastest method (for me) depends on which version of SQL Server I use. The lack of a minimally-logged INSERT in SQL Server 2005 means that SELECT...INTO followed by a clustered index build is faster. The improvements in SQL Server 2008 make INSERT...SELECT (with an existing clustered index) faster.
So, my 'improved' version of the first query (pre-existing index) is:
--
-- Load into an existing index
-- (fastest on 2008)
--
IF OBJECT_ID(N'tempdb..#Tally', N'U')
IS NOT NULL
BEGIN
DROP TABLE #Tally;
END;
GO
DECLARE @StartTime DATETIME;
SET @StartTime = GETDATE();
CREATE TABLE #Tally
(
N BIGINT NOT NULL PRIMARY KEY CLUSTERED
);
INSERT #Tally
(N)
SELECT TOP (10000)
ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM master.sys.allocation_units A,
master.sys.allocation_units B;
SELECT [INSERT...SELECT] = DATEDIFF(MILLISECOND, @StartTime, GETDATE());
GO
DROP TABLE #Tally;
'Improved' version of the second query (load then index) is:
--
-- Load then create index
-- (fastest on 2005)
--
IF OBJECT_ID(N'tempdb..#Tally', N'U')
IS NOT NULL
BEGIN
DROP TABLE #Tally;
END;
GO
DECLARE @StartTime DATETIME;
SET @StartTime = GETDATE();
SELECT TOP (10000)
ISNULL(ROW_NUMBER() OVER(ORDER BY (SELECT 1)), -1) AS N
INTO #Tally
FROM master.sys.allocation_units A,
master.sys.allocation_units B;
ALTER TABLE #Tally
ADD PRIMARY KEY CLUSTERED (N)
WITH
FILLFACTOR = 100;
SELECT [SELECT...INTO] = DATEDIFF(MILLISECOND, @StartTime, GETDATE());
GO
DROP TABLE #Tally;
Typical run times on 2008:
Query 1: 13ms
Query 2: 23ms
Typical run times on 2005:
Query 1: 96ms
Query 2: 20ms
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi