SELECT INTO vs INSERT INTO on Columnstore

Question

Post reply

SELECT INTO vs INSERT INTO on Columnstore

ramya.sqlservr

Ten Centuries

Points: 1068
More actions
June 8, 2015 at 10:16 pm

#318762

Comments posted to this topic are about the item SELECT INTO vs INSERT INTO on Columnstore

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply

satnam.singh 44358 Ten Centuries Points: 1305 More actions · Answer 1

satnam.singh 44358

Ten Centuries

Points: 1305

June 9, 2015 at 2:56 am

#1803889

Thank you. It was a Great one.

ravikiran.siddamshetty SSC Rookie Points: 33 More actions · Answer 2

ravikiran.siddamshetty

SSC Rookie

Points: 33

June 9, 2015 at 3:14 am

#1803891

Nice Information. Really useful...

PatrickSimons Right there with Babe Points: 787 More actions · Answer 3

What about locking system resources when using SELECT INTO?

SELECT INTO was sometimes disadvised...

Patrick SIMONS, MCP

sibasish.sengupta SSC Rookie Points: 26 More actions · Answer 4

sibasish.sengupta

SSC Rookie

Points: 26

June 9, 2015 at 4:50 am

#1803896

Nicely explained.

lduvall SSCarpal Tunnel Points: 4869 More actions · Answer 5

It is very timely that you posted this article. I just got dinged by my dba and told to remove all my INSERT INTO # and change them to SELECT INTO. When I asked why, he merely said it's faster. Your explanation makes sense, but I'm curious if this is only because of columnstore indexing in 2014.

Are there performance benefits to doing SELECT INTO versus INSERT INTO on a SQL 2008R2 instance or SQL2012 with tables not using columnstore index at all? I would love to understand more.

dbishop Mr or Mrs. 500 Points: 565 More actions · Answer 6

I'm sorry if I am being too critical, but this seemed like more of an article to provide a lot of screen shots, and not a lot of substance. SELECT INTO and then creating ANY index is almost always going to be faster than INSERT unless it is a clustered index and data is being inserted in the correct order, whether it is being done in SQL Server 2000 or 2014. SELECT INTO is minimally logged, so the log growth will always be smaller, again regardless of the version of SQL Server. These are facts that are almost always true whether dealing with a CS index, a clustered index or is just a heap.

It was kind of sucky lduvall that the DBA didn't go into any explanation of why it is faster to use SELECT INTO. That really didn't teach you anything. But the main reason is that it is minimally logged, where INSERT logs every row being inserted into the table.

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 7

akljfhnlaflkj

SSC Guru

Points: 76202

June 9, 2015 at 7:05 am

#1803936

Thanks for the great article.

Scott-144766 SSCarpal Tunnel Points: 4333 More actions · Answer 8

If you meet the criteria for minimal logging (https://msdn.microsoft.com/en-us/ms190422.aspx) you can use INSERT without the same amount of log file usage

--
Scott

Irish Flyer SSCrazy Points: 2245 More actions · Answer 9

Your article is incomplete without adding that a SELECT INTO is considered a DDL statement and completely locks both source and target tables for the duration of its operation. In any busy instance there are usually a number of "hot" tables that are used by many users simultaneously. The use of a SELECT INTO using one of these will cause a work stoppage until it finishes. As a DBA with a lot of experience and the administrator of a 140TB data warehouse complex, I do not allow the use of SELECT INTO except on the smallest and least significant tables. A few of our tables use more than a TB on their own and take a long time to copy.

I can enforce this practice because I, or one of my staff, perform code reviews on all code performing any write operations that will run on my warehouse or datamarts.

PatrickSimons Right there with Babe Points: 787 More actions · Answer 10

I agree 100%. Especially when you SELECT INTO inside a transaction to a temporary table (##) = worst case....

Patrick SIMONS, MCP

TheSQLGuru SSC Guru Points: 134015 More actions · Answer 11

I think you missed a VERY important type of test in your examples: 1M row inserts at a time, in discreet transactions. Please try that (primarily for the insert to the table that already has the CCI on it) and report back.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 12

Irish Flyer (6/9/2015)
Your article is incomplete without adding that a SELECT INTO is considered a DDL statement and completely locks both source and target tables for the duration of its operation. In any busy instance there are usually a number of "hot" tables that are used by many users simultaneously. The use of a SELECT INTO using one of these will cause a work stoppage until it finishes. As a DBA with a lot of experience and the administrator of a 140TB data warehouse complex, I do not allow the use of SELECT INTO except on the smallest and least significant tables. A few of our tables use more than a TB on their own and take a long time to copy.
I can enforce this practice because I, or one of my staff, perform code reviews on all code performing any write operations that will run on my warehouse or datamarts.

Gosh... I've never experienced such a thing except for the schema locks that it puts on the source table or if someone does it across a linked server. There's also the long leftover perception that it will also lock TempDB and make it useless if the target table is in TempDB and that hasn't been true since SQL Server 6.5 SP1 (they actually had a hot fix for 6.5 RTM, as well).

Guess it's time for me to do another test on it since several revs of SQL Server had experience many revs since the last time I did such a test. I'lll see what I can come up with over the weekend.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 13

The scenario descriptions in the final chart of the article also don't seem to match the actual scenario descriptions in the text of the article. That also means (I haven't done the check) that the data in that chart might also be incorrect.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

JoeS 3024 Mr or Mrs. 500 Points: 533 More actions · Answer 14

So for those of us who aren't DBA experts does this mean that choosing between a SELECT INTO and INSERT INTO is one of those "It Depends" choices? Using a SELECT INTO maybe OK for a small table or a DB that isn't heavily used but when you have a large DB or heavily used table take the hit and choose INSERT INTO so as to not cause potential blocking issues. Does that sound reasonable?

Thanks