Which one is better select * into # temp from tableA Vs create #temp table insert into...

  • lrutkowski (6/23/2011)


    I was always taught to use the create table, insert method. not because it is faster, but because it blocks the sysobjects while the table is being created and the data is inserted. Don't know oif this is still true, but old habits die hard. You could do a select into where 1=2, and then an insert. then the select into would create the table, not insert data, and be very quick.

    Leonard

    That myth should have died in sql 7. Sql 11 is coming out and it's not anywhere as near as dead as I'd like to see it!

  • tfifield (6/23/2011)


    TheSQLGuru (6/23/2011)


    sqldba_icon (6/22/2011)


    I really appreciate and thank all of you for taking some time to reply to my post. I think i will try creating a temp table first, then an index(probably clustered) and then insert data into temp table. Now the amount of data being inserted into temp table is pretty constant about 170K - 200 K records.

    Please tell me why you want to create an index on the temp table. In 15 years of doing SQL Server work I can probably count on 1 or maybe 2 hands the number of times when an index on a temp table was appropriate for it's given use.

    Kevin,

    I've found that if the temp table is used to join to the main table on pretty much the same index (especially if the index on the main table is covering), I can get a MERGE instead of HASH JOIN. This sometimes will greatly boost performance.

    However, for the most part I don't bother since creating the index on the temp table takes away more than I get back on the JOIN.

    Todd Fifield

    That could be a valid use. I note that I have seen the optimizer actually SORT data in a much smaller table to utilize a MERGE join in a larger one.

    I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index. Oh, and obviously you want said index created with explicit 100% fill factor too.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/23/2011)


    I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index.

    Kevin, Please help me understand this concept (note my earlier comment about internals). Links, RTFM and lmgtfy.com are fine if it's a can of worms but I figured I would ask. For this pseudocode, why would fragmentation occur? and why would it be faster?

    1. Create table.

    2. Create proper unique clustered index.

    3. Ask for INSERT...SELECT where the SELECT includes an ORDER BY naming the same column order as the unique clustered index.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • That myth should have died in sql 7. Sql 11 is coming out and it's not anywhere as near as dead as I'd like to see it!

    yep, like I said, old habits die hard. Right after I posted I googled it, and saw it was fixed after 7. So I guess you can teach an old dog new tricks.:-)

  • opc.three (6/23/2011)


    TheSQLGuru (6/23/2011)


    I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index.

    Kevin, Please help me understand this concept (note my earlier comment about internals). Links, RTFM and lmgtfy.com are fine if it's a can of worms but I figured I would ask. For this pseudocode, why would fragmentation occur? and why would it be faster?

    1. Create table.

    2. Create proper unique clustered index.

    3. Ask for INSERT...SELECT where the SELECT includes an ORDER BY naming the same column order as the unique clustered index.

    You are giving a very specific case here that was not what I understood the OP was intending nor is it a generic case. But here goes: even for your exact case the optimizer and storage engine too would have to be CODED to KNOW that the SELECT statement doing the population of the INSERT exactly matches the clustered index of the table. I do not know that it is (although logically it SHOULD be). I also wonder if the UNIQUENESS of your particular index might not monkey with things in that the extra check for uniqueness might not get in the way of allowing a good data flow from the SELECT into the table. If everything is set up in the engine to allow direct ordered insert your example may well be faster than populating the table and then building the index afterwards. But in the generic case of non-sorted data it would not nor would it be faster or especially result in less fragmentation.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • opc.three,

    On your question to Kevin.

    I've bench marked CREATE TABLE/INSERT INTO vs SELECT INTO many times. What I come up with is that the SELECT INTO is always faster - sometimes a little and sometimes by a lot. Jeff Moden has come up with similar results.

    My understanding of ORDER BY is that it is applied to the completed record set, so it would seem that it is the CREATE TABLE/INSERT part that is taking longer than SELECT INTO since the ORDER BY is only being applied to the completed record set anyhow.

    Most of my bench marks don't involve an index on the temp table - just the creation of it. I always create the index after the SELECT INTO as Kevin suggested, since the create time is better. If the temp table is large, then I use the 100% fill factor and it seems to help performance in most cases.

    Some day I'll bench mark indexing the temp table and possibly write an article about it.

    Todd Fifield

  • tfifield (6/23/2011)


    . . . I always create the index after the SELECT INTO as Kevin suggested, since the create time is better. If the temp table is large, then I use the 100% fill factor and it seems to help performance in most cases.

    . . .

    Can I inquire about what percentage of the time you use a temp table that you actually do create one or more indexes on it? And do you routinely examine the query plans and performance implications to see if said index(es) actually improve performance? Again I have seen VERY few real-world cases where they were appropriate.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks for taking the time, I appreciate your responses. You're correct Kevin, the thread definitely took on a slightly different personality once indexes were added to the mix and so my question may have appeared to come out of left field, but I swear there was a technical train of thought there. Also, I could not help myself, with the you and the other good folks on this thread I had to throw it out there because of the learning op 🙂 I accept that SELECT...INTO is preferred over CREATE TABLE followed by INSERT...SELECT in the generic case. What little I know of internals still lands me on "it will depend on the scenario" as to which method will be faster when indexes are involved, but it's not like I have a breadth of experience in this area on my side helping me land there so I definitely look forward to reading your article Todd. If you need a proof-reader or tester drop me a PM.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • sqldba_icon (6/21/2011)


    Can any one suggest based on there experience which one of these are preferred.

    i) select * into #temp from tableA

    ii) create table #temp (col 1)

    insert into #temp

    select col 1 from tableA

    The number of records inserted into temp table would be between 100 K - 400 K. I couldnt find much info online. Any suggestion would be appreciated. Thanks

    Why do you need to select so many rows into a temporary table?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • This conversation resparked a curiousity on my part.

    The following was tested in a 2k8 R1 environment off my local drives. End result, and if you read the notes you'll see why: No indexing, SELECT INTO. If indexing, fully prebuild it. Test yourself of course, but that's now my rule of thumb.

    -- DROP TABLE CoreTable

    -- DROP TABLE AggregateMe

    CREATE TABLE CoreTable (ctID INT IDENTITY( 1, 1) NOT NULL,

    ctNCID INT NULL,

    ctAttr1 INT NOT NULL,

    ctAttr2 INT NOT NULL,

    CONSTRAINT PK_CoreTable PRIMARY KEY CLUSTERED (ctID)

    )

    GO

    CREATE INDEX idx_CoreTable_Attr ON CoreTable (ctAttr1, ctAttr2)

    GO

    CREATE TABLE AggregateMe (ctID INT NOT NULL,

    ctNCID INT NOT NULL,

    amSumUpVal DECIMAL(20,4) NULL,

    amAvgUpVal DECIMAL(20,4) NULL)

    GO

    CREATE CLUSTERED INDEX cidx_AggregateMe ON AggregateMe (ctID)

    GO

    -- Generate some test data:

    INSERT INTO CoreTable (ctAttr1, ctAttr2)

    SELECT

    rn%10 AS Attr1,

    rn%20 AS Attr2

    FROM

    (SELECT TOP 100000

    Row_number() OVER (ORDER BY RAND()) AS rn

    FROM

    master..syscolumns AS s1

    CROSS JOIN

    master..syscolumns AS s2

    CROSS JOIN

    master..syscolumns AS s3

    ) AS drv

    UPDATE CoreTable SET ctNCID = 100005 - ctID --(we want it in a different order)

    -- TRUNCATE TABLE AggregateMe

    INSERT INTO AggregateMe

    SELECT

    ct.ctID,

    ct.ctNCID,

    CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) / CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) AS amSumUpVal,

    CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) / CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) AS amAvgUpVal

    FROM

    CoreTable AS ct

    CROSS JOIN

    (SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5

    UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 0)

    AS drvMultiplier

    GO

    select top 10 * from CoreTable

    SELECT TOP 50 * FROM AggregateMe

    GO

    -- So, we have our baseline data in place, now, for testing. Run twice.

    SET STATISTICS IO, TIME ON

    PRINT 'Create Table'

    CREATE TABLE #PreBuild (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

    PRINT 'Insert Into'

    INSERT INTO #PreBuild

    SELECT

    ctID,

    SUM( amSumUpVal) AS SumVal,

    AVG( amAvgUpVal) AS AvgVal

    FROM

    AggregateMe AS am

    GROUP BY

    ctID

    HAVING

    SUM( AmSumUpVal) > 0

    PRINT 'SELECT INTO'

    SELECT

    ctID,

    SUM( amSumUpVal) AS SumVal,

    AVG( amAvgUpVal) AS AvgVal

    INTO

    #SelectInto

    FROM

    AggregateMe AS am

    GROUP BY

    ctID

    HAVING

    SUM( AmSumUpVal) > 0

    SET STATISTICS TIME, IO OFF

    IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL

    DROP TABLE #PreBuild

    IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL

    DROP TABLE #SelectInto

    GO

    -- So, in general, no indexes and whatnot on the tables, the SELECT INTO runs at about 10-15ms faster.

    -- This is insignificant in general, especially in context of this build.

    -- This is only important in hyper-optimization.

    -- NEXT!

    -- Index discussions.

    -- Note the ctNCID is at the exact same selectivity level as ctID on purpose.

    -- This is a functionality optimization test, not a real-world scenario. The idea is to prove

    -- the underlying theory, without having to include a heuristics discussion.

    -- First, building off the clustered index in AggregateMe

    SET NOCOUNT ON

    PRINT 'CREATE TABLE - CLUSTERED INDEX BEFORE INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE TABLE #PreBuild (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

    SET STATISTICS TIME OFF

    PRINT 'CREATE INDEX - CLUSTERED INDEX BEFORE INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild (ctID)

    SET STATISTICS TIME OFF

    PRINT 'INSERT INTO - CLUSTERED INDEX BEFORE INSERT, Using ctID'

    SET STATISTICS TIME ON

    INSERT INTO #PreBuild

    SELECT

    ctID,

    SUM( amSumUpVal) AS SumVal,

    AVG( amAvgUpVal) AS AvgVal

    FROM

    AggregateMe AS am

    GROUP BY

    ctID

    HAVING

    SUM( AmSumUpVal) > 0

    SET STATISTICS TIME OFF

    PRINT 'CREATE TABLE - CLUSTERED INDEX AFTER INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE TABLE #PreBuild2 (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

    SET STATISTICS TIME OFF

    PRINT 'INSERT INTO - CLUSTERED INDEX AFTER INSERT, Using ctID'

    SET STATISTICS TIME ON

    INSERT INTO #PreBuild2

    SELECT

    ctID,

    SUM( amSumUpVal) AS SumVal,

    AVG( amAvgUpVal) AS AvgVal

    FROM

    AggregateMe AS am

    GROUP BY

    ctID

    HAVING

    SUM( AmSumUpVal) > 0

    SET STATISTICS TIME OFF

    PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild2 (ctID)

    SET STATISTICS TIME OFF

    PRINT 'SELECT INTO'

    SET STATISTICS TIME ON

    SELECT

    ctID,

    SUM( amSumUpVal) AS SumVal,

    AVG( amAvgUpVal) AS AvgVal

    INTO

    #SelectInto

    FROM

    AggregateMe AS am

    GROUP BY

    ctID

    HAVING

    SUM( AmSumUpVal) > 0

    SET STATISTICS TIME OFF

    PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE CLUSTERED INDEX cidx_SelectInto ON #SelectInto (ctID)

    SET STATISTICS TIME OFF

    IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL

    DROP TABLE #PreBuild

    IF OBJECT_ID( 'tempdb..#PreBuild2') IS NOT NULL

    DROP TABLE #PreBuild2

    IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL

    DROP TABLE #SelectInto

    GO

    -- The reason I wrapped the ON/OFF this way is to emphasize the CREATE INDEX having

    -- three unique time components involved that must be summed to have the actual full

    -- time it took to build the index. In this case ~150 milliseconds, much higher than

    -- the extra ~20 milliseconds that it takes to insert into the table if it's pre-built.

    -- Since SELECT INTO can't have the clustered pre-built, it takes less time then the

    -- prebuilt table with post-built clustered, but is still more time consuming then the

    -- full prebuilt.

    -- With this test, we've proven that building the clustered index on the temp table

    -- before inserting data that is grouped on the clustered index from the source table

    -- is faster. Let's test the ctNCID, which we'll index equivalently.

    --This code is the exact same as above, just using the ctNCID column instead.

    SET NOCOUNT ON

    PRINT 'CREATE TABLE - CLUSTERED INDEX BEFORE INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE TABLE #PreBuild (ctNCID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

    SET STATISTICS TIME OFF

    PRINT 'CREATE INDEX - CLUSTERED INDEX BEFORE INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild (ctNCID)

    SET STATISTICS TIME OFF

    PRINT 'INSERT INTO - CLUSTERED INDEX BEFORE INSERT, Using ctID'

    SET STATISTICS TIME ON

    INSERT INTO #PreBuild

    SELECT

    ctNCID,

    SUM( amSumUpVal) AS SumVal,

    AVG( amAvgUpVal) AS AvgVal

    FROM

    AggregateMe AS am

    GROUP BY

    ctNCID

    HAVING

    SUM( AmSumUpVal) > 0

    SET STATISTICS TIME OFF

    PRINT 'CREATE TABLE - CLUSTERED INDEX AFTER INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE TABLE #PreBuild2 (ctNCID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

    SET STATISTICS TIME OFF

    PRINT 'INSERT INTO - CLUSTERED INDEX AFTER INSERT, Using ctID'

    SET STATISTICS TIME ON

    INSERT INTO #PreBuild2

    SELECT

    ctNCID,

    SUM( amSumUpVal) AS SumVal,

    AVG( amAvgUpVal) AS AvgVal

    FROM

    AggregateMe AS am

    GROUP BY

    ctNCID

    HAVING

    SUM( AmSumUpVal) > 0

    SET STATISTICS TIME OFF

    PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild2 (ctNCID)

    SET STATISTICS TIME OFF

    PRINT 'SELECT INTO'

    SET STATISTICS TIME ON

    SELECT

    ctNCID,

    SUM( amSumUpVal) AS SumVal,

    AVG( amAvgUpVal) AS AvgVal

    INTO

    #SelectInto

    FROM

    AggregateMe AS am

    GROUP BY

    ctNCID

    HAVING

    SUM( AmSumUpVal) > 0

    SET STATISTICS TIME OFF

    PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'

    SET STATISTICS TIME ON

    CREATE CLUSTERED INDEX cidx_SelectInto ON #SelectInto (ctNCID)

    SET STATISTICS TIME OFF

    IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL

    DROP TABLE #PreBuild

    IF OBJECT_ID( 'tempdb..#PreBuild2') IS NOT NULL

    DROP TABLE #PreBuild2

    IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL

    DROP TABLE #SelectInto

    GO

    -- Skipping the gory details, the fully pre-built STILL wins.

    -- Even though it takes on average about 40ms longer to insert all the data,

    -- the subsequent clustered index build takes about 180ms.

    -- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Gee Craig, I don't know what to say, except wow! I got pulled into an SSIS mess yesterday afternoon so did not have time to do any testing. Even if I did I am not sure I would have produced anything that quick, nicely done. Using your script my findings were similar on a 2008R2 Standard Edition instance.

    I tried adding UNIQUE to all clustered indexes as well as ORDER BY to the INSERT...SELECT to see if it added any time but changes to results were insignificant. I may try this with a wider clustering key but I expect I will adopt the default approach in your conclusion.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Craig Farrell (6/23/2011)


    This conversation resparked a curiousity on my part.

    When I run that code as a single unit, it keeps returning the following error...

    There is already an object named '#PreBuild' in the database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ever thought of the Query Analyzer?

    the first option is best.

  • Eric M Russell (6/23/2011)


    sqldba_icon (6/21/2011)


    Can any one suggest based on there experience which one of these are preferred.

    i) select * into #temp from tableA

    ii) create table #temp (col 1)

    insert into #temp

    select col 1 from tableA

    The number of records inserted into temp table would be between 100 K - 400 K. I couldnt find much info online. Any suggestion would be appreciated. Thanks

    Why do you need to select so many rows into a temporary table?

    Eric not sure what do u mean by that question but that is not under my control. There could be a way to reduce the number of records but there are bigger problems like the one mentioned in this topic. thanks

  • SQL 2008 come with query execution plan and estimation. try testing this queries and see the estimated results.

Viewing 15 posts - 16 through 30 (of 59 total)

You must be logged in to reply to this topic. Login to reply