SELECT INTO vs INSERT INTO on Columnstore

  • I haven't worked with SQL on this level for the last 3 years so some things may have changed, BUT isn't the biggest difference between "Select into" and "insert into "

    is that second stmt requires the table to exists where first stmtm will auto-create one for you?

    obviously, "Select into" will move the data first and than create index. that is how it have always worked by default (as far as I know).

    you can do the same for "Insert Into" if you choose,

    create the table as is, no index. run "Insert" than create index on full table.

    in my last encounter with SQL admin, I was doing the opposite of

    what "SSC-Enthusiastic" did, I was replacing "Select Into" with "Insert Into"

    as the customer specs was revised post-factum(after initial code review) and emphasized that ALL tables and supporting objects, MUST exist before any data is placed in the database at any time and by any user. even DBA.

    go figure 🙂

  • It would be interesting to update the article to insert into a table while using minimal logging (i.e. with TABLOCK plus a few other conditions). That should compare more favourably to SELECT INTO. Also it would be useful to compare SELECT INTO and inserting to an uncompressed heap while using minimal logging (and then indexing).

    See https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx.


    When in doubt - test, test, test!

    Wayne

  • Irish Flyer (6/9/2015)


    Your article is incomplete without adding that a SELECT INTO is considered a DDL statement and completely locks both source and target tables for the duration of its operation. In any busy instance there are usually a number of "hot" tables that are used by many users simultaneously. The use of a SELECT INTO using one of these will cause a work stoppage until it finishes. As a DBA with a lot of experience and the administrator of a 140TB data warehouse complex, I do not allow the use of SELECT INTO except on the smallest and least significant tables. A few of our tables use more than a TB on their own and take a long time to copy.

    I can enforce this practice because I, or one of my staff, perform code reviews on all code performing any write operations that will run on my warehouse or datamarts.

    I second this. Unless you are doing one-time, temporary work, you will likely experience locking / blocking for other users all for the sake of a lazy, slightly faster query that ends up using more space.

  • This articale is pretty good, but... One point. Almost every time when you load data into clean table without any index) and when you load data into clean table where index is already in place the first way will be faster then the second one. It is related to SQL Server data load process.

    When you load data into a clean table and you create an index after SQL Server can take a look at all data you have in your table and build an index based on such data.

    When you load data into a table when index is already on place the SQL Server have to modified the index accordingly to the new data which entering the table.

    That's way a good practice when you load a data in a new table is to drop/disable indexes before the load, load the data and then recreate/rebuild the indexes.

    Good test will be to load data by using SELECT INSERT and INSERT INTO but for both cases create Clustered Columnstore Index (CCI) on these target tables after data will be loaded (not before)

  • Good article, thank you.

  • Joseph M. Steinbrunner (8/8/2016)


    Irish Flyer (6/9/2015)


    Your article is incomplete without adding that a SELECT INTO is considered a DDL statement and completely locks both source and target tables for the duration of its operation. In any busy instance there are usually a number of "hot" tables that are used by many users simultaneously. The use of a SELECT INTO using one of these will cause a work stoppage until it finishes. As a DBA with a lot of experience and the administrator of a 140TB data warehouse complex, I do not allow the use of SELECT INTO except on the smallest and least significant tables. A few of our tables use more than a TB on their own and take a long time to copy.

    I can enforce this practice because I, or one of my staff, perform code reviews on all code performing any write operations that will run on my warehouse or datamarts.

    I second this. Unless you are doing one-time, temporary work, you will likely experience locking / blocking for other users all for the sake of a lazy, slightly faster query that ends up using more space.

    I'll see if I can find the proof to the contrary that I used to demo to folks when they claimed that it would also lock up TempDB until complete. Some of this definitely used to be true way back in 6.5 prior to SP 1 (they came out with a hot fix in 6.5 RTM) but the only time I've run into such a problem is when trying to do such a thing across a linked server. I'll also state that I've seen it block the heck out of Object Explorer but, to date, no where else other than those two places.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I impressed greatly that the way you have written such great article. I got many reference about this topic but your one is wonderful.

    Thanks !!

Viewing 7 posts - 31 through 36 (of 36 total)

You must be logged in to reply to this topic. Login to reply