SELECT INTO vs INSERT INTO on Columnstore

  • ramya.sqlservr

    Ten Centuries

    Points: 1021

    Comments posted to this topic are about the item SELECT INTO vs INSERT INTO on Columnstore

  • satnam.singh 44358

    Ten Centuries

    Points: 1152

    Thank you. It was a Great one.

  • ravikiran.siddamshetty

    SSC Rookie

    Points: 33

    Nice Information. Really useful...

  • PatrickSimons

    SSChasing Mays

    Points: 606

    What about locking system resources when using SELECT INTO?

    SELECT INTO was sometimes disadvised...


    Patrick SIMONS, MCP

  • sibasish.sengupta

    SSC Rookie

    Points: 26

    Nicely explained.

  • lduvall

    SSCarpal Tunnel

    Points: 4772

    It is very timely that you posted this article. I just got dinged by my dba and told to remove all my INSERT INTO # and change them to SELECT INTO. When I asked why, he merely said it's faster. Your explanation makes sense, but I'm curious if this is only because of columnstore indexing in 2014.

    Are there performance benefits to doing SELECT INTO versus INSERT INTO on a SQL 2008R2 instance or SQL2012 with tables not using columnstore index at all? I would love to understand more.

  • dbishop

    Mr or Mrs. 500

    Points: 559

    I'm sorry if I am being too critical, but this seemed like more of an article to provide a lot of screen shots, and not a lot of substance. SELECT INTO and then creating ANY index is almost always going to be faster than INSERT unless it is a clustered index and data is being inserted in the correct order, whether it is being done in SQL Server 2000 or 2014. SELECT INTO is minimally logged, so the log growth will always be smaller, again regardless of the version of SQL Server. These are facts that are almost always true whether dealing with a CS index, a clustered index or is just a heap.

    It was kind of sucky lduvall that the DBA didn't go into any explanation of why it is faster to use SELECT INTO. That really didn't teach you anything. But the main reason is that it is minimally logged, where INSERT logs every row being inserted into the table.

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Thanks for the great article.

  • Scott-144766

    SSCarpal Tunnel

    Points: 4215

    If you meet the criteria for minimal logging (https://msdn.microsoft.com/en-us/ms190422.aspx) you can use INSERT without the same amount of log file usage

    --
    Scott

  • Irish Flyer

    SSCrazy

    Points: 2245

    Your article is incomplete without adding that a SELECT INTO is considered a DDL statement and completely locks both source and target tables for the duration of its operation. In any busy instance there are usually a number of "hot" tables that are used by many users simultaneously. The use of a SELECT INTO using one of these will cause a work stoppage until it finishes. As a DBA with a lot of experience and the administrator of a 140TB data warehouse complex, I do not allow the use of SELECT INTO except on the smallest and least significant tables. A few of our tables use more than a TB on their own and take a long time to copy.

    I can enforce this practice because I, or one of my staff, perform code reviews on all code performing any write operations that will run on my warehouse or datamarts.

  • PatrickSimons

    SSChasing Mays

    Points: 606

    I agree 100%. Especially when you SELECT INTO inside a transaction to a temporary table (##) = worst case....


    Patrick SIMONS, MCP

  • TheSQLGuru

    SSC Guru

    Points: 134017

    I think you missed a VERY important type of test in your examples: 1M row inserts at a time, in discreet transactions. Please try that (primarily for the insert to the table that already has the CCI on it) and report back.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jeff Moden

    SSC Guru

    Points: 993788

    Irish Flyer (6/9/2015)


    Your article is incomplete without adding that a SELECT INTO is considered a DDL statement and completely locks both source and target tables for the duration of its operation. In any busy instance there are usually a number of "hot" tables that are used by many users simultaneously. The use of a SELECT INTO using one of these will cause a work stoppage until it finishes. As a DBA with a lot of experience and the administrator of a 140TB data warehouse complex, I do not allow the use of SELECT INTO except on the smallest and least significant tables. A few of our tables use more than a TB on their own and take a long time to copy.

    I can enforce this practice because I, or one of my staff, perform code reviews on all code performing any write operations that will run on my warehouse or datamarts.

    Gosh... I've never experienced such a thing except for the schema locks that it puts on the source table or if someone does it across a linked server. There's also the long leftover perception that it will also lock TempDB and make it useless if the target table is in TempDB and that hasn't been true since SQL Server 6.5 SP1 (they actually had a hot fix for 6.5 RTM, as well).

    Guess it's time for me to do another test on it since several revs of SQL Server had experience many revs since the last time I did such a test. I'lll see what I can come up with over the weekend.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993788

    The scenario descriptions in the final chart of the article also don't seem to match the actual scenario descriptions in the text of the article. That also means (I haven't done the check) that the data in that chart might also be incorrect.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • JoeS 3024

    SSC-Addicted

    Points: 473

    So for those of us who aren't DBA experts does this mean that choosing between a SELECT INTO and INSERT INTO is one of those "It Depends" choices? Using a SELECT INTO maybe OK for a small table or a DB that isn't heavily used but when you have a large DB or heavily used table take the hit and choose INSERT INTO so as to not cause potential blocking issues. Does that sound reasonable?

    Thanks

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply