SELECT INTO vs INSERT INTO on Columnstore

  • Comments posted to this topic are about the item SELECT INTO vs INSERT INTO on Columnstore

  • Thank you. It was a Great one.

  • Nice Information. Really useful...

  • What about locking system resources when using SELECT INTO?

    SELECT INTO was sometimes disadvised...

    Patrick SIMONS, MCP

  • Nicely explained.

  • It is very timely that you posted this article. I just got dinged by my dba and told to remove all my INSERT INTO # and change them to SELECT INTO. When I asked why, he merely said it's faster. Your explanation makes sense, but I'm curious if this is only because of columnstore indexing in 2014.

    Are there performance benefits to doing SELECT INTO versus INSERT INTO on a SQL 2008R2 instance or SQL2012 with tables not using columnstore index at all? I would love to understand more.

  • I'm sorry if I am being too critical, but this seemed like more of an article to provide a lot of screen shots, and not a lot of substance. SELECT INTO and then creating ANY index is almost always going to be faster than INSERT unless it is a clustered index and data is being inserted in the correct order, whether it is being done in SQL Server 2000 or 2014. SELECT INTO is minimally logged, so the log growth will always be smaller, again regardless of the version of SQL Server. These are facts that are almost always true whether dealing with a CS index, a clustered index or is just a heap.

    It was kind of sucky lduvall that the DBA didn't go into any explanation of why it is faster to use SELECT INTO. That really didn't teach you anything. But the main reason is that it is minimally logged, where INSERT logs every row being inserted into the table.

  • Thanks for the great article.

  • If you meet the criteria for minimal logging (https://msdn.microsoft.com/en-us/ms190422.aspx) you can use INSERT without the same amount of log file usage

    --
    Scott

  • Your article is incomplete without adding that a SELECT INTO is considered a DDL statement and completely locks both source and target tables for the duration of its operation. In any busy instance there are usually a number of "hot" tables that are used by many users simultaneously. The use of a SELECT INTO using one of these will cause a work stoppage until it finishes. As a DBA with a lot of experience and the administrator of a 140TB data warehouse complex, I do not allow the use of SELECT INTO except on the smallest and least significant tables. A few of our tables use more than a TB on their own and take a long time to copy.

    I can enforce this practice because I, or one of my staff, perform code reviews on all code performing any write operations that will run on my warehouse or datamarts.

  • I agree 100%. Especially when you SELECT INTO inside a transaction to a temporary table (##) = worst case....

    Patrick SIMONS, MCP

  • I think you missed a VERY important type of test in your examples: 1M row inserts at a time, in discreet transactions. Please try that (primarily for the insert to the table that already has the CCI on it) and report back.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Irish Flyer (6/9/2015)


    Your article is incomplete without adding that a SELECT INTO is considered a DDL statement and completely locks both source and target tables for the duration of its operation. In any busy instance there are usually a number of "hot" tables that are used by many users simultaneously. The use of a SELECT INTO using one of these will cause a work stoppage until it finishes. As a DBA with a lot of experience and the administrator of a 140TB data warehouse complex, I do not allow the use of SELECT INTO except on the smallest and least significant tables. A few of our tables use more than a TB on their own and take a long time to copy.

    I can enforce this practice because I, or one of my staff, perform code reviews on all code performing any write operations that will run on my warehouse or datamarts.

    Gosh... I've never experienced such a thing except for the schema locks that it puts on the source table or if someone does it across a linked server. There's also the long leftover perception that it will also lock TempDB and make it useless if the target table is in TempDB and that hasn't been true since SQL Server 6.5 SP1 (they actually had a hot fix for 6.5 RTM, as well).

    Guess it's time for me to do another test on it since several revs of SQL Server had experience many revs since the last time I did such a test. I'lll see what I can come up with over the weekend.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The scenario descriptions in the final chart of the article also don't seem to match the actual scenario descriptions in the text of the article. That also means (I haven't done the check) that the data in that chart might also be incorrect.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • So for those of us who aren't DBA experts does this mean that choosing between a SELECT INTO and INSERT INTO is one of those "It Depends" choices? Using a SELECT INTO maybe OK for a small table or a DB that isn't heavily used but when you have a large DB or heavily used table take the hit and choose INSERT INTO so as to not cause potential blocking issues. Does that sound reasonable?

    Thanks

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply