Using Select into #Temp vs. Create #Temp then Insert into #Temp

  • We write a lot of stored procedures for reporting purposes and in those stored procedures we create a lot of temp tables. Our DBA insists that we create the temp table first and then do an insert into that temp table instead of doing a select into temp. I Google'd this online and a lot of documentation on this talks about SQL Server 6.5 and the issues that it had with Select into (we are running SQL Server 2005). Does it still create a lock on tempdb? Is there really any performance gains to had from doing it in this manner?

  • It hasn't been a problem in over a decade, but that doesn't stop lots of DBAs from thinking it is one.

    I've done a lot of load and speed testing with both methods, and Select Into is generally faster and creates less contention than Create Insert Select. Not always faster, but more often than not.

    The only place I've see it be slower is when you're dealing with very large amounts of undefined data. For example, Select Into from an OpenRowset on a text file or spreadsheet. With that, SQL Server has to scan the input to decide what data type to make a column. Usually, as per data on IMEX and the related registry entries, that's fast. But, if you have had to modify the registry to allow for full column scanning (mixed data types in Excel columns is the usual reason for that), then Select Into can become very slow compared to Create Insert Select.

    If you aren't dealing with that particular situation, then Select Into is almost certainly going to be faster and more efficient. It also has advantages with regards to avoiding column truncation and a few things like that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I think GSquared and I are having philosophical differences today. I generally don't do SELECT INTO because at design time I want to know exactly what the structure and all datatypes will be. As far as performance I really think in most cases we are talking fractions of fractions of seconds. But like many things, style may vary and you can test the variations to see what you like. Me, I like the Create/Insert method.

    CEWII

  • I've always forced our guys to define all temp tables at the beginning of the stored procedure. Interleaving DDL with DML causes the SPs to recompile. I was under the assumption that using SELECT...INTO...FROM also caused SP recompilations.

    Is this still true in SQL Server 2005? If so, it would be another arguement for defining your temp tables with the CREATE TABLE construct.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John Rowan (11/17/2009)


    I've always forced our guys to define all temp tables at the beginning of the stored procedure. Interleaving DDL with DML causes the SPs to recompile. I was under the assumption that using SELECT...INTO...FROM also caused SP recompilations.

    Is this still true in SQL Server 2005? If so, it would be another arguement for defining your temp tables with the CREATE TABLE construct.

    In SQL 2000, this was partially true. I say partially because if the data in the temp table changes significantly, it would have to recompile again. Having all the DDL at the begining of the proc only avoids recompiles if the stats in the temp tables don't change enough during runtime to require building new execution plans for later steps in the proc. Since a heavy insert or any update on an indexed column can cause stat changes, it's pretty hard to avoid this issue, regardless of sequence of DDL/DML. The main way to avoid this issue is to use table variables, since they don't involve recompiles on DDL nor do they keep stats.

    In SQL 2005, it can cause statement recompilation, but doesn't generally cause whole-proc recompilation. SQL 2005 will also hold onto the definition and first page of data for later runs of the proc if it calculates that this will save recompiles and execution time.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Elliott W (11/17/2009)


    I think GSquared and I are having philosophical differences today. I generally don't do SELECT INTO because at design time I want to know exactly what the structure and all datatypes will be. As far as performance I really think in most cases we are talking fractions of fractions of seconds. But like many things, style may vary and you can test the variations to see what you like. Me, I like the Create/Insert method.

    CEWII

    Correct. The speed difference only matters if your database will have to scale to heavy use.

    But you do mention one thing that I consider a drawback to Create Insert as opposed to Select Into: You want to know the exact structure and datatype at design time, whereas I see it as you must know the exact structure and data type for all future runtimes.

    I recently had a problem come up because a varchar column in a table had to be expanded by 5 characters to accommodate a new data case, and a proc with a temp table in it didn't have it's column also expanded by 5 characters. Developer overlooked the issue. Unit testing wasn't adequate, and the table change was taken live. Simple human error.

    Select Into avoids that issue. You can, as needed, change a base table, without rippling the change through every proc that uses a temp table on it.

    Of course, this isn't a big deal. It's a minor difference. There are advantages and drawbacks either way. I see the advantages for Select Into as outweighing the drawbacks, but that certainly doesn't mean I'm right and you're wrong, or vice versa, it just means we weigh these things differently.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (11/18/2009)


    Of course, this isn't a big deal. It's a minor difference. There are advantages and drawbacks either way. I see the advantages for Select Into as outweighing the drawbacks, but that certainly doesn't mean I'm right and you're wrong, or vice versa, it just means we weigh these things differently.

    And that is ok. I have always said there is more than one way to solve most problems, and often more than one good way. This is more style and outlook than hard performance and best practive. And again, that is ok..

    CEWII

  • If the table you are selecting into will later have a clustered index created on it, it is usually faster to create the table first and insert the data in order by the clustered index than to load the whole table and then reload it by creating a clustered index.

  • Michael Valentine Jones (11/18/2009)


    If the table you are selecting into will later have a clustered index created on it, it is usually faster to create the table first and insert the data in order by the clustered index than to load the whole table and then reload it by creating a clustered index.

    Yep. That can make a difference too. Matters more on tables that are large enough to overflow onto disk as opposed to staying in RAM, but definitely something to consider.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Elliott W (11/18/2009)


    GSquared (11/18/2009)


    Of course, this isn't a big deal. It's a minor difference. There are advantages and drawbacks either way. I see the advantages for Select Into as outweighing the drawbacks, but that certainly doesn't mean I'm right and you're wrong, or vice versa, it just means we weigh these things differently.

    And that is ok. I have always said there is more than one way to solve most problems, and often more than one good way. This is more style and outlook than hard performance and best practive. And again, that is ok..

    CEWII

    One main advantage to create first insert second is that you can document that much more easily and clearly.

    Another is that you can add constraints before you insert.

    It definitely goes both ways.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Just a couple of other points that spring to mind:

    SELECT INTO can be minimally logged if the recovery model is BULK_LOGGED or SIMPLE.

    SELECT INTO is limited to creating a table on the default file group.

    ISNULL() together with an explicit CAST or CONVERT can help document and enforce column types when creating a table with SELECT INTO:

    SELECT

    c1 = CONVERT(TINYINT, 1),

    c2 = CONVERT(VARCHAR(10), 'Fish'),

    c3 = CONVERT(NVARCHAR(5), N'Chips'),

    c4 = CONVERT(SMALLDATETIME, CURRENT_TIMESTAMP),

    c5 = CONVERT(MONEY, $5.99),

    c6 = CONVERT(UNIQUEIDENTIFIER, NEWID())

    INTO

    #Disposable;

    GO

    -- All columns are NULLable

    EXECUTE tempdb..sp_columns @table_name = '#Disposable';

    GO

    DROP TABLE #Disposable;

    GO

    SELECT

    c1 = ISNULL(CONVERT(TINYINT, 1), 0),

    c2 = ISNULL(CONVERT(VARCHAR(10), 'Fish'), ''),

    c3 = ISNULL(CONVERT(NVARCHAR(5), N'Chips'), N''),

    c4 = ISNULL(CONVERT(SMALLDATETIME, CURRENT_TIMESTAMP), {d '2000-01-01'}),

    c5 = ISNULL(CONVERT(MONEY, $5.99), $0.00),

    c6 = ISNULL(CONVERT(UNIQUEIDENTIFIER, NEWID()), {guid'00000000-0000-0000-0000-000000000000'})

    INTO

    #Disposable;

    GO

    -- All columns are NOT NULLable

    EXECUTE tempdb..sp_columns @table_name = '#Disposable';

    GO

    DROP TABLE #Disposable;

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply