Home Forums SQL Server 2005 T-SQL (SS2K5) Primary Key creation - before and after bulk insert RE: Primary Key creation - before and after bulk insert

  • Hi Paul, it's good to hear from you. 🙂 And a great answer, thanks a lot.

    Paul White NZ (8/13/2010)


    First, you must avoid an explicit sort operation if possible. In your first script (where the clustered index exists before the load) you are loading the result of a ROW_NUMBER ranking function into an INTEGER column. ROW_NUMBER returns BIGINT; the type mismatch is enough to confuse the optimiser in thinking it needs to sort the results before inserting into the table.

    Finally, try not to use master.dbo.spt_values - use a documented system table instead. The reason is that the internals of spt_values changed significantly in SQL Server 2008, making it a very slow way to grab rows.

    I don't usually use undocumented stuff as they are bound to change or get dropped. I picked that spt_values part from another post. I was not aware that row_number returns bigint. It is really good to know that as I tend to use row_number function a lot.

    Second, try to achieve a minimally-logged load. In SQL Server 2005, an INSERT...SELECT statement is always fully logged. The minimally-logged alternative is SELECT...INTO. This does mean creating the clustered index as a second operation, but the benefit of minimal logging more than makes up for this overhead. Just to confuse matters, SQL Server 2008 can do a minimally-logged INSERT...SELECT in many circumstances.

    The fastest method (for me) depends on which version of SQL Server I use. The lack of a minimally-logged INSERT in SQL Server 2005 means that SELECT...INTO followed by a clustered index build is faster. The improvements in SQL Server 2008 make INSERT...SELECT (with an existing clustered index) faster.

    I was not aware of these benefits of SELECT.. INTO. It's really good to know this.

    I have another question for you Paul. Say, I have an SP which uses a temporary table. The number of rows in the temporary table depends on the SP parameters. So, for one set of parameters, the table has 100 rows and for another set, the table has 100, 000 rows. Will the execution plan be changed once the 100,000 rows are inserted? I've always assumed it to. Ref: http://support.microsoft.com/kb/243586

    - arjun

    https://sqlroadie.com/