Which one is better select * into # temp from tableA Vs create #temp table insert into...

  • Can any one suggest based on there experience which one of these are preferred.

    i) select * into #temp from tableA

    ii) create table #temp (col 1)

    insert into #temp

    select col 1 from tableA

    The number of records inserted into temp table would be between 100 K - 400 K. I couldnt find much info online. Any suggestion would be appreciated. Thanks

  • I prefer Select Into, because in performance tests I've done it's slightly faster, and because it will preserve DDL changes in the source Select between iterations.

    E.g.: If you have to expand one of the columns from varchar(25) to varchar(50) in one of the tables you are selecting from, Select Into will automatically create the temp table with the expanded column, but Create...Insert...Select will require changing the Create statement. That's usually a benefit, but it can be a problem if you're expecting an error in that case.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (6/21/2011)


    I prefer Select Into, because in performance tests I've done it's slightly faster, and because it will preserve DDL changes in the source Select between iterations.

    E.g.: If you have to expand one of the columns from varchar(25) to varchar(50) in one of the tables you are selecting from, Select Into will automatically create the temp table with the expanded column, but Create...Insert...Select will require changing the Create statement. That's usually a benefit, but it can be a problem if you're expecting an error in that case.

    Thanks for your reply..Hmm..i thought first creating a temp table then populating data would be preferred in my case. I guess i will test it out and see the performance

  • sqldba_icon (6/21/2011)


    GSquared (6/21/2011)


    I prefer Select Into, because in performance tests I've done it's slightly faster, and because it will preserve DDL changes in the source Select between iterations.

    E.g.: If you have to expand one of the columns from varchar(25) to varchar(50) in one of the tables you are selecting from, Select Into will automatically create the temp table with the expanded column, but Create...Insert...Select will require changing the Create statement. That's usually a benefit, but it can be a problem if you're expecting an error in that case.

    Thanks for your reply..Hmm..i thought first creating a temp table then populating data would be preferred in my case. I guess i will test it out and see the performance

    It depends on the scenario. GSquared just conveyed a personal preference backed up by some performance findings and some nicetys offered by the technique.

    In the case of SELECT...*...INTO (in your post title), sometimes it's a good thing to leave columns added to the base schema after your code was originally written out of the temp tables. Sometimes you may want that behavior in which case SELECT * INTO would be a good thing.

    Whether supplying an explicit column list or a * SELECT...INTO generates a table with no clustered index, i.e. a HEAP. If you're managing large quantities of data in temp tables and you want a clustered index on the table you may be better off creating the table ahead of time and then doing the insert.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/21/2011)


    Whether supplying an explicit column list or a * SELECT...INTO generates a table with no clustered index, i.e. a HEAP. If you're managing large quantities of data in temp tables and you want a clustered index on the table you may be better off creating the table ahead of time and then doing the insert.

    In either case, when the clustered index component mattered for the #tbl (not just the optimizer detecting better joining methods) due to data volume, I've almost always found better performance applying the clustered index after shoving all the data in.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (6/21/2011)


    opc.three (6/21/2011)


    Whether supplying an explicit column list or a * SELECT...INTO generates a table with no clustered index, i.e. a HEAP. If you're managing large quantities of data in temp tables and you want a clustered index on the table you may be better off creating the table ahead of time and then doing the insert.

    In either case, when the clustered index component mattered for the #tbl (not just the optimizer detecting better joining methods) due to data volume, I've almost always found better performance applying the clustered index after shoving all the data in.

    Which method performs better in the end will depend on the data volume, recovery model, data set shape, probably too many variables to definitively say one method will perform better than the other 100% of the time. If it really matters, exhaustive testing with representative data sets in a prod or prod-like environment will be the only way to know for sure. I was just adding some additional information to the thread post, I am not disputing anything said so far. To be fair, I did say "may", because I have seen each way perform better than the other. That said, it's good to know both you guys see one way as generally preferable over the other. I had not run it down to the point of coming to a conclusion so I'll take your lead and make SELECT INTO my default option.

    I was doing some looking around to see what I could find in the way of documented performance testing done on the two methods and I stumbled on this connect item. The response from Eric@MS was very enlightening. I was not aware of the technique but I will keep it in mind for down the line tasks. It could be useful to the original poster very soon:

    http://connect.microsoft.com/SQLServer/feedback/details/254716/clustered-index-on-select-into

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/21/2011)


    Which method performs better in the end will depend on the data volume, recovery model, data set shape, probably too many variables to definitively say one method will perform better than the other 100% of the time. If it really matters, exhaustive testing with representative data sets in a prod or prod-like environment will be the only way to know for sure.

    Yeeps, sorry Orlando, I fear you may have taken me the wrong way. I meant to add to your commentary, not abuse it. Apologies. Allow me to rephrase:

    In my personal experience I almost always build the Clustered Index after the data insert in either case due to better performance during testing I did on the concept. This testing was primarily done in 2k and I haven't personally seen anything that would optimize it better in the upgrade information since. However, testing will always be necessary for your specific scenario to confirm it holds true.

    I had not run it down to the point of coming to a conclusion so I'll take your lead and make SELECT INTO my default option.

    In honesty that opinion is mostly due to the work of previous gurus whos word I've taken at face value. If I need a restricted #tbl I'll create it first, or if I'm concerned about inheritance by weaker developers. If I need a generic dump I'll use SELECT INTO because of previous timetests done by others.

    The response from Eric@MS was very enlightening. I was not aware of the technique but I will keep it in mind for down the line tasks.

    You'll be able to create an empty table with a clustered index, then load it this way and only write the data once (no need to write all the data to the log). You have to use a table lock (TABLOCK) hint on the target of the insert to get this to kick in. This works in CTP4.

    The rules in the Books Online topic "Prerequisites for Minimal Logging in Bulk Import" apply.

    Write once I'll agree with, but either version would 'write once'. I'm curious as to the thrash on the insert into with clustered existing vs. the heap dump with a clustered after. I may build a new test case to test that.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Hi there,

    My scripting level is way below you guys but I was thinking about your requirement.

    It sounds like you want the table indexed so that when the data is inserted it doesnt have to re-jig on index creation. If im right how about this:

    SELECT {column} into #test FROM [prod_table} WHERE 1=0

    SELECT * FROM #test --should be nothing but blank table with columns

    CREATE CLUSTERED INDEX test_index ON #test({column})

    INSERT #test SELECT name FROM {prod_table}

    SELECT * FROM #test --data should be in table indexed

    --DROP TABLE #test

    Of cource if i have completely wrong please let me down gently 🙂

    Adam Zacks-------------------------------------------Be Nice, Or Leave

  • Another thing to consider is collation.

    SELECT INTO will always get the collation of character columns correct.

    If you do use the CREATE TABLE method then the collation should always be specified, normally by COLLATE DATABASE_DEFAULT, just in case the DB has a different collation to tempdb.

  • Craig Farrell (6/22/2011)


    Yeeps, sorry Orlando, I fear you may have taken me the wrong way. I meant to add to your commentary, not abuse it. Apologies.

    No way Craig. I re-read my post and it's me who got defensive, sorry about that.

    Your experiences are very valuable to hear because I have seen it be better both ways. It is surprising however that SELECT...INTO may be preferred in most cases. HEAP to HEAP I could see SELECT...INTO being faster than CREATE TABLE followed by INSERT...SELECT in most cases, it's when there is a CI involved where I think it gets more interesting. Intuition and what I know of internals (admittedly not a lot) leads me to think that having a CI in place ahead of time and asking for an ordered insert from the INSERT...SELECT would be preferred in most cases because I am not writing a heap just to tear it down and rebuild it as a clustered table...I may do some testing too.

    I like the script you provided Schadenfreude-Mei. It solves the issue Ken raised by remaining dynamic but it also allows you to create a CI ahead of time. Very creative. Thanks for sharing.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Craig Farrell (6/21/2011)


    opc.three (6/21/2011)


    Whether supplying an explicit column list or a * SELECT...INTO generates a table with no clustered index, i.e. a HEAP. If you're managing large quantities of data in temp tables and you want a clustered index on the table you may be better off creating the table ahead of time and then doing the insert.

    In either case, when the clustered index component mattered for the #tbl (not just the optimizer detecting better joining methods) due to data volume, I've almost always found better performance applying the clustered index after shoving all the data in.

    If it matches the clustered index on the source, having it there before-hand can make a big difference in performance. Should only matter on very large temp tables, and potentially not then.

    The only way to tell which is better is test your specific scenario, and understand that the test is only valid with current data and may be invalidated over time as data changes. But that's what monitoring performance is for.

    Honestly, the performance issues are usually minor. The bigger issue is do you want changes in the underlying schema to carry forward automatically or only explicitly?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I really appreciate and thank all of you for taking some time to reply to my post. I think i will try creating a temp table first, then an index(probably clustered) and then insert data into temp table. Now the amount of data being inserted into temp table is pretty constant about 170K - 200 K records.

  • I was always taught to use the create table, insert method. not because it is faster, but because it blocks the sysobjects while the table is being created and the data is inserted. Don't know oif this is still true, but old habits die hard. You could do a select into where 1=2, and then an insert. then the select into would create the table, not insert data, and be very quick.

    Leonard

  • sqldba_icon (6/22/2011)


    I really appreciate and thank all of you for taking some time to reply to my post. I think i will try creating a temp table first, then an index(probably clustered) and then insert data into temp table. Now the amount of data being inserted into temp table is pretty constant about 170K - 200 K records.

    Please tell me why you want to create an index on the temp table. In 15 years of doing SQL Server work I can probably count on 1 or maybe 2 hands the number of times when an index on a temp table was appropriate for it's given use.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/23/2011)


    sqldba_icon (6/22/2011)


    I really appreciate and thank all of you for taking some time to reply to my post. I think i will try creating a temp table first, then an index(probably clustered) and then insert data into temp table. Now the amount of data being inserted into temp table is pretty constant about 170K - 200 K records.

    Please tell me why you want to create an index on the temp table. In 15 years of doing SQL Server work I can probably count on 1 or maybe 2 hands the number of times when an index on a temp table was appropriate for it's given use.

    Kevin,

    I've found that if the temp table is used to join to the main table on pretty much the same index (especially if the index on the main table is covering), I can get a MERGE instead of HASH JOIN. This sometimes will greatly boost performance.

    However, for the most part I don't bother since creating the index on the temp table takes away more than I get back on the JOIN.

    Todd Fifield

Viewing 15 posts - 1 through 15 (of 59 total)

You must be logged in to reply to this topic. Login to reply