Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Which one is better select * into # temp from tableA Vs create #temp table insert into... Expand / Collapse
Author
Message
Posted Tuesday, June 21, 2011 2:03 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, December 16, 2014 9:01 AM
Points: 1,299, Visits: 3,003
Can any one suggest based on there experience which one of these are preferred.

i) select * into #temp from tableA


ii) create table #temp (col 1)

insert into #temp
select col 1 from tableA

The number of records inserted into temp table would be between 100 K - 400 K. I couldnt find much info online. Any suggestion would be appreciated. Thanks
Post #1129308
Posted Tuesday, June 21, 2011 2:08 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Thursday, December 18, 2014 9:58 AM
Points: 13,872, Visits: 9,600
I prefer Select Into, because in performance tests I've done it's slightly faster, and because it will preserve DDL changes in the source Select between iterations.

E.g.: If you have to expand one of the columns from varchar(25) to varchar(50) in one of the tables you are selecting from, Select Into will automatically create the temp table with the expanded column, but Create...Insert...Select will require changing the Create statement. That's usually a benefit, but it can be a problem if you're expecting an error in that case.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1129313
Posted Tuesday, June 21, 2011 3:40 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, December 16, 2014 9:01 AM
Points: 1,299, Visits: 3,003
GSquared (6/21/2011)
I prefer Select Into, because in performance tests I've done it's slightly faster, and because it will preserve DDL changes in the source Select between iterations.

E.g.: If you have to expand one of the columns from varchar(25) to varchar(50) in one of the tables you are selecting from, Select Into will automatically create the temp table with the expanded column, but Create...Insert...Select will require changing the Create statement. That's usually a benefit, but it can be a problem if you're expecting an error in that case.



Thanks for your reply..Hmm..i thought first creating a temp table then populating data would be preferred in my case. I guess i will test it out and see the performance
Post #1129364
Posted Tuesday, June 21, 2011 4:22 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, December 18, 2014 8:51 PM
Points: 7,140, Visits: 12,763
sqldba_icon (6/21/2011)
GSquared (6/21/2011)
I prefer Select Into, because in performance tests I've done it's slightly faster, and because it will preserve DDL changes in the source Select between iterations.

E.g.: If you have to expand one of the columns from varchar(25) to varchar(50) in one of the tables you are selecting from, Select Into will automatically create the temp table with the expanded column, but Create...Insert...Select will require changing the Create statement. That's usually a benefit, but it can be a problem if you're expecting an error in that case.



Thanks for your reply..Hmm..i thought first creating a temp table then populating data would be preferred in my case. I guess i will test it out and see the performance

It depends on the scenario. GSquared just conveyed a personal preference backed up by some performance findings and some nicetys offered by the technique.

In the case of SELECT...*...INTO (in your post title), sometimes it's a good thing to leave columns added to the base schema after your code was originally written out of the temp tables. Sometimes you may want that behavior in which case SELECT * INTO would be a good thing.

Whether supplying an explicit column list or a * SELECT...INTO generates a table with no clustered index, i.e. a HEAP. If you're managing large quantities of data in temp tables and you want a clustered index on the table you may be better off creating the table ahead of time and then doing the insert.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1129375
Posted Tuesday, June 21, 2011 5:58 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Monday, December 15, 2014 2:26 PM
Points: 5,466, Visits: 7,647
opc.three (6/21/2011)

Whether supplying an explicit column list or a * SELECT...INTO generates a table with no clustered index, i.e. a HEAP. If you're managing large quantities of data in temp tables and you want a clustered index on the table you may be better off creating the table ahead of time and then doing the insert.


In either case, when the clustered index component mattered for the #tbl (not just the optimizer detecting better joining methods) due to data volume, I've almost always found better performance applying the clustered index after shoving all the data in.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1129401
Posted Tuesday, June 21, 2011 10:09 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, December 18, 2014 8:51 PM
Points: 7,140, Visits: 12,763
Craig Farrell (6/21/2011)
opc.three (6/21/2011)

Whether supplying an explicit column list or a * SELECT...INTO generates a table with no clustered index, i.e. a HEAP. If you're managing large quantities of data in temp tables and you want a clustered index on the table you may be better off creating the table ahead of time and then doing the insert.


In either case, when the clustered index component mattered for the #tbl (not just the optimizer detecting better joining methods) due to data volume, I've almost always found better performance applying the clustered index after shoving all the data in.

Which method performs better in the end will depend on the data volume, recovery model, data set shape, probably too many variables to definitively say one method will perform better than the other 100% of the time. If it really matters, exhaustive testing with representative data sets in a prod or prod-like environment will be the only way to know for sure. I was just adding some additional information to the thread post, I am not disputing anything said so far. To be fair, I did say "may", because I have seen each way perform better than the other. That said, it's good to know both you guys see one way as generally preferable over the other. I had not run it down to the point of coming to a conclusion so I'll take your lead and make SELECT INTO my default option.

I was doing some looking around to see what I could find in the way of documented performance testing done on the two methods and I stumbled on this connect item. The response from Eric@MS was very enlightening. I was not aware of the technique but I will keep it in mind for down the line tasks. It could be useful to the original poster very soon:

http://connect.microsoft.com/SQLServer/feedback/details/254716/clustered-index-on-select-into


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1129440
Posted Wednesday, June 22, 2011 2:39 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Monday, December 15, 2014 2:26 PM
Points: 5,466, Visits: 7,647
opc.three (6/21/2011)

Which method performs better in the end will depend on the data volume, recovery model, data set shape, probably too many variables to definitively say one method will perform better than the other 100% of the time. If it really matters, exhaustive testing with representative data sets in a prod or prod-like environment will be the only way to know for sure.


Yeeps, sorry Orlando, I fear you may have taken me the wrong way. I meant to add to your commentary, not abuse it. Apologies. Allow me to rephrase:

In my personal experience I almost always build the Clustered Index after the data insert in either case due to better performance during testing I did on the concept. This testing was primarily done in 2k and I haven't personally seen anything that would optimize it better in the upgrade information since. However, testing will always be necessary for your specific scenario to confirm it holds true.

I had not run it down to the point of coming to a conclusion so I'll take your lead and make SELECT INTO my default option.


In honesty that opinion is mostly due to the work of previous gurus whos word I've taken at face value. If I need a restricted #tbl I'll create it first, or if I'm concerned about inheritance by weaker developers. If I need a generic dump I'll use SELECT INTO because of previous timetests done by others.

The response from Eric@MS was very enlightening. I was not aware of the technique but I will keep it in mind for down the line tasks.


You'll be able to create an empty table with a clustered index, then load it this way and only write the data once (no need to write all the data to the log). You have to use a table lock (TABLOCK) hint on the target of the insert to get this to kick in. This works in CTP4.

The rules in the Books Online topic "Prerequisites for Minimal Logging in Bulk Import" apply.


Write once I'll agree with, but either version would 'write once'. I'm curious as to the thrash on the insert into with clustered existing vs. the heap dump with a clustered after. I may build a new test case to test that.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1129511
Posted Wednesday, June 22, 2011 4:07 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, March 4, 2014 10:03 AM
Points: 1,096, Visits: 1,334
Hi there,

My scripting level is way below you guys but I was thinking about your requirement.
It sounds like you want the table indexed so that when the data is inserted it doesnt have to re-jig on index creation. If im right how about this:


SELECT {column} into #test FROM [prod_table} WHERE 1=0
SELECT * FROM #test --should be nothing but blank table with columns
CREATE CLUSTERED INDEX test_index ON #test({column})
INSERT #test SELECT name FROM {prod_table}
SELECT * FROM #test --data should be in table indexed
--DROP TABLE #test


Of cource if i have completely wrong please let me down gently


Adam Zacks

-------------------------------------------

Be Nice, Or Leave
Post #1129537
Posted Wednesday, June 22, 2011 5:12 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Monday, December 15, 2014 9:03 AM
Points: 851, Visits: 5,596
Another thing to consider is collation.

SELECT INTO will always get the collation of character columns correct.

If you do use the CREATE TABLE method then the collation should always be specified, normally by COLLATE DATABASE_DEFAULT, just in case the DB has a different collation to tempdb.
Post #1129561
Posted Wednesday, June 22, 2011 5:59 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, December 18, 2014 8:51 PM
Points: 7,140, Visits: 12,763
Craig Farrell (6/22/2011)
Yeeps, sorry Orlando, I fear you may have taken me the wrong way. I meant to add to your commentary, not abuse it. Apologies.

No way Craig. I re-read my post and it's me who got defensive, sorry about that.

Your experiences are very valuable to hear because I have seen it be better both ways. It is surprising however that SELECT...INTO may be preferred in most cases. HEAP to HEAP I could see SELECT...INTO being faster than CREATE TABLE followed by INSERT...SELECT in most cases, it's when there is a CI involved where I think it gets more interesting. Intuition and what I know of internals (admittedly not a lot) leads me to think that having a CI in place ahead of time and asking for an ordered insert from the INSERT...SELECT would be preferred in most cases because I am not writing a heap just to tear it down and rebuild it as a clustered table...I may do some testing too.


I like the script you provided Schadenfreude-Mei. It solves the issue Ken raised by remaining dynamic but it also allows you to create a CI ahead of time. Very creative. Thanks for sharing.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1129577
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse