SELECT INTO FROM.. or INSERT INTO..?

  • I am reviewing the existing TSQL code in a stored procedure.

    There are six tables that need to be updated.

    So what they do is drop tables,

    then issue lengthy SELECT INTO .. FROM statements

    to populate the tables with the new data.

    My question:

    Is there a difference in performance between using SELECT INTO..FROM

    and INSERT INTO?

  • take a look there: http://sql-server-performance.com/Community/forums/t/12040.aspx

  • 1) SELECT INTO ....... IS FASTER THAN INSERT INTO.

    2) SELECT INTO ....... CREATE NEW TABLE BUT INSERT INTO does'nt create a new table.

    IF YOU WANT INSERT DATA ONCE USE SELECT INTO ....... .

  • Dropping the tables every time and then recreating the entire data set will eventually become much more painful than joining between the tables to move only the new data. For scalability purposes, you might want to explore only moving the data you need to move rather than recreating everything every time.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • If those are reporting tables, you can consider truncating them before inserting the data. That won't lock temdb for the duration of the insert.

  • SELECT INTO will often times be significantly faster. BUT it locks up the sysobjects table while it runs. So, if you have any other SPs or scripts(that need to DDL statements) running at the same time you will be waiting for the process to finish before they can complete.

    We had a process that would use this methodology and it worked great. Cut the time to complete the process from 14 hours down to 45 minutes. The caveat was that no one could use the DB while it was running.

    Gary Johnson
    Sr Database Engineer

  • There's no way in hell that Select into was the only contributing factor in this.

    Are you sure you didn't have other problems like blocking, index building, too much data processed at once (instead of doing smaller chunks inserts of 50 000 rows).

  • Gary Johnson (9/9/2008)


    SELECT INTO will often times be significantly faster. BUT it locks up the sysobjects table while it runs. So, if you have any other SPs or scripts(that need to DDL statements) running at the same time you will be waiting for the process to finish before they can complete.

    We had a process that would use this methodology and it worked great. Cut the time to complete the process from 14 hours down to 45 minutes. The caveat was that no one could use the DB while it was running.

    The locking sysobjects thing stopped being an issue back in 6.5. Not sure what you ran into, but I'd have to agree - this is not the issue.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ninja's_RGR'us (9/9/2008)


    There's no way in hell that Select into was the only contributing factor in this.

    Are you sure you didn't have other problems like blocking, index building, too much data processed at once (instead of doing smaller chunks inserts of 50 000 rows).

    When you are denormalizing multiple tables of 150+ MILLION records, it definitely IS a possibility. And yes, we tried the chunking of the data (first way we wrote it actually). It was a total shock to me that the SELECT INTO was so much faster. And believe me, I would rather loop through in chunks so that I can also error check, restart, and give a running report of the data. But not at the cost of that much time!

    BTW we found that for most of our chunking of data, that somewhere between 700K and 1.2 million is the sweet spot. Go less than that and it is slower, go more and it is slower. It all depends on what we are doing so testing is paramount!

    And yes, we use SQL 2k5 and the locking of sysobjects is still very much a factor with SELECT INTO. A long running SELECT INTO like we were doing would literally lock up the database.

    Gary Johnson
    Sr Database Engineer

  • Nice topic.

    Just two additional links (Note: The information is mostly for SQL Server 2008, but also relevant for 2005 as well):

    http://www.pythian.com/blogs/895/minimal-logging-basics-and-sql-server-2008-enhancements

    http://msdn.microsoft.com/en-us/library/ms177445.aspx

    Best Regards,

    Chris Büttner

  • Great news on the new minimal logging ability in 2008! I'm sure as we move data around quite often we will look for that feature.

    This might just be one of the things that pushes us to go with 08.

    Gary Johnson
    Sr Database Engineer

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply