Which one is better select * into # temp from tableA Vs create #temp table insert into...

  • Jeff Moden (6/24/2011)


    opc.three (6/24/2011)


    Be careful with table variables. They always optimize with an assumed rowcount of 1 so can produce very bad execution plans...even with small actual rowcounts.

    A statement level OPTION(RECOMPILE) will fix that a lot of times but I stillavoid Table Variables because there's still zero chance of statistics be created and they don't persist in SSMS like Temp Tables do which makes troubleshooting a whole lot easier.

    Oooh, I forgot about OPTION(RECOMPILE). I could not agree more on all points re: avoiding them 😎

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • The decision of temp table vs table variable vs CTE or etc. depends on how this intermediate resultset will be leveraged within the stored procedure.

    Where I've mostly used table variables or temp tables in the past, back in version 2000, was to hold a smallish resultset (few throusand rows max) obtained by joining several large tables, and then that resultset needs to be referenced in multiple locations in the stored procedure. This would be complex end-of-month reporting style procedures.

    That's why I said earlier that since 2005 I've tended to use common table expressions or indexed views instead, which provides proper statistics and a static query plan. If the resultset needs to be referenced by multiple selects in the same procedure, then I may still go with a table variable over a CTE. If the resutlset is referenced by multiple stored procedures, then I'll go with an indexed view.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (6/24/2011)


    The decision of temp table vs table variable vs CTE or etc. depends on how this intermediate resultset will be leveraged within the stored procedure.

    Where I've mostly used table variables or temp tables in the past, back in version 2000, was to hold a smallish resultset (few throusand rows max) obtained by joining several large tables, and then that resultset needs to be referenced in multiple locations in the stored procedure. This would be complex end-of-month reporting style procedures.

    That's why I said earlier that since 2005 I've tended to use common table expressions or indexed views instead, which provides proper statistics and a static query plan. If the resultset needs to be referenced by multiple selects in the same procedure, then I may still go with a table variable over a CTE. If the resutlset is referenced by multiple stored procedures, then I'll go with an indexed view.

    Just to confirm, you're familiar with the fact that a CTE is nothing more then a subquery, and has no optimization value? It's merely there to allow for more readable code unless you're working with recursive CTEs, in which case it has additional functionality.

    Also, if you're getting timeouts or bad plans from the optimizer due to complexity, midpoint #tmp tables can ease the optimizer load and get you much better plans, especially if they allow for better row reduction by using two plans. YMMV per query, of course, but they can be used quite effectively even in single re-use scenarios. Depending on your transactional logic, however, you're right that an indexed view may be leveraged better overall. It'll depend on how much concurrency blocking you're willing to have during IUD operations.

    @jeff: That's good, because today's been one heck of a day. :hehe:


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I ran the test script after adding the TABLOCK hint to the target # table on INSERT and the end result was similar: for heaps SELECT...INTO is still faster than pre-building the table; and when a CI is in place a pre-built table & CI followed by INSERT...SELECT is still the fastest of the three methods attempted. I don't expect much to change but for completeness I still have plans to construct a test case where wider clustering keys are used.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Craig Farrell (6/24/2011)


    ...

    Just to confirm, you're familiar with the fact that a CTE is nothing more then a subquery, and has no optimization value? It's merely there to allow for more readable code unless you're working with recursive CTEs, in which case it has additional functionality.

    ...

    Temp tables, CTEs, and indexed views are different approaches to solving basically the same problem: joining a persisted or interim resultset. In this case we're discussing, the resultset is about 100,000 rows, so perhaps a CTE would not be a good fit, but if given a shot it might surprise us. The thing about a CTE is that it's going against the base tables, so it requires no intermediate indexing. It also depends on the schema and what the developer is doing with these 100,000 rows, which we don't know.

    In some cases referencing a CTE will provide better stored procedure performance overall than if we had chosen to create, load, and join a temp table. A CTE would be more useful for something like a procedure that accepts customer_id as a parameter and then must return a resultset containing the customer's home, cell, and work phone # in separate columns, where these attributes are contained in a customer_phone table keyed on customer_id + phone_type. If this procedure is called 10,000 times a day, and it's important to persist the query plans, then a CTE may well prove the better choice over a temp table. However, in that example I provided, creating an indexed view that is keyed on customer_id and denormalizes the home, cell, work, and emergency contact numbers into separate columns would perhaps be the most optimal choice.

    The original question was how to create the optimal temp table, but I'm just saying that are perhaps even better options than a temp table.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (6/25/2011)


    Craig Farrell (6/24/2011)


    ...

    Just to confirm, you're familiar with the fact that a CTE is nothing more then a subquery, and has no optimization value? It's merely there to allow for more readable code unless you're working with recursive CTEs, in which case it has additional functionality.

    ...

    Temp tables, CTEs, and indexed views are different approaches to solving basically the same problem: joining a persisted or interim resultset.

    No, a CTE is not a persisted or interim result set. There is no difference between the following two statements:

    ;WITH cte AS (SELECT id, abc from tbl where x='y')

    SELECT a.col2, cte.abc FROM tbl2 AS a JOIN cte ON a.id = cte.id

    AND

    SELECT a.col2, cte.abc FROM tbl2 AS a JOIN ( SELECT id, abc FROM tbl WHERE x = 'y') AS cte ON a.id = cte.id

    Both are optimized directly into the query plan as subqueries.

    In this case we're discussing, the resultset is about 100,000 rows, so perhaps a CTE would not be a good fit, but if given a shot it might surprise us.

    As much as a subquery will, I'll agree with that.

    The thing about a CTE is that it's going against the base tables, so it requires no intermediate indexing. It also depends on the schema and what the developer is doing with these 100,000 rows, which we don't know.

    So is my #tmp table build, but the secondary joins is another story and those would.

    In some cases referencing a CTE will provide better stored procedure performance overall than if we had chosen to create, load, and join a temp table. A CTE would be more useful for something like a procedure that accepts customer_id as a parameter and then must return a resultset containing the customer's home, cell, and work phone # in separate columns, where these attributes are contained in a customer_phone table keyed on customer_id + phone_type.

    Why?

    If this procedure is called 10,000 times a day, and it's important to persist the query plans, then a CTE may well prove the better choice over a temp table.

    True, but there are workarounds to this, but I can't argue that I don't want to constantly recompile a highly tuned query.

    However, in that example I provided, creating an indexed view that is keyed on customer_id and denormalizes the home, cell, work, and emergency contact numbers into separate columns would perhaps be the most optimal choice.

    That would depend heavily on how much churn there was on your base table(s) which would require upkeep of the view as well. However, I've used them when my selects were more important then my transactional speed.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • The optimization issue is what keeps me away from table variables. I had heard about the recompile option but it did not sway my view on things given the expense and other benefits offered by temp tables. Speaking of views, indexed views are useful in some situation, but they have baggage too. The schemabinding requirement brings the pain sometimes and the lack of first-class citizenship in the Standard Edition Optimizer keeps me away because unfortunately I don't have the luxury of being on Enterprise Edition everywhere I go.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I absolutely agree, Orlando. The ONLY times I use Table Variables is when I have to and there are only two places where I have to...

    1. In User Defined Functions.

    2. In places where I'm collecting information for logging and I don't want that information rolled back if an error occurs. Of course, there's a TRY/CATCH involved in all that.

    Yes, I agree... there are places where a Table Variable is sometimes faster than a Temp Table. It's just not worth the hassle when troubleshooting because they don't persist even in SSMS.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't know if this applies to your situation, but for me, I will nearly always choose SELECT....INTO when writing ad hoc queries, b/c I'm selecting from tables in a third-party-vendor database. If they change the column definitions behind my back -- which isn't common, but it does happen -- then I'm covered.

    So, for example, say I write and save some code using CREATE TABLE/INSERT that queries a VARCHAR(20) column. Sometime later and without my knowledge, the vendor changes that VARCHAR(20) to a VARCHAR(40), and users start putting bigger strings into the column. By hard-coding the column definition, my code will now break with an error "String or binary data would be truncated.":

    if object_id('tempdb.dbo.#vendortable','U') is not null drop table #vendortable;

    if object_id('tempdb.dbo.#t','U') is not null drop table #t;

    create table #vendortable (mytext varchar(10));

    --populate "vendor's" table

    insert into #vendortable (mytext) values ('abcdefghi');

    --create a temp table, as though running an ad hoc query, using the CREATE/INSERT approach

    create table #t (shorttext varchar(5));

    --This fails: String or binary data would be truncated.

    insert into #t (shorttext) select mytext from #vendortable;

    Using SELECT INTO automatically handles this. All I really care about for these kinds of ad hoc queries is that I'm getting some text data back, and I don't really care (within reason) how big it is.

    Again, this point is just for ad hoc queries I'm running. For production, I'd be testing for performance and thinking about durability.

    HTH,

    Rich

  • rmechaber (6/26/2011)


    I don't know if this applies to your situation, but for me, I will nearly always choose SELECT....INTO when writing ad hoc queries, b/c I'm selecting from tables in a third-party-vendor database. If they change the column definitions behind my back -- which isn't common, but it does happen -- then I'm covered.

    So, for example, say I write and save some code using CREATE TABLE/INSERT that queries a VARCHAR(20) column. Sometime later and without my knowledge, the vendor changes that VARCHAR(20) to a VARCHAR(40), and users start putting bigger strings into the column. By hard-coding the column definition, my code will now break with an error "String or binary data would be truncated.":

    if object_id('tempdb.dbo.#vendortable','U') is not null drop table #vendortable;

    if object_id('tempdb.dbo.#t','U') is not null drop table #t;

    create table #vendortable (mytext varchar(10));

    --populate "vendor's" table

    insert into #vendortable (mytext) values ('abcdefghi');

    --create a temp table, as though running an ad hoc query, using the CREATE/INSERT approach

    create table #t (shorttext varchar(5));

    --This fails: String or binary data would be truncated.

    insert into #t (shorttext) select mytext from #vendortable;

    Using SELECT INTO automatically handles this. All I really care about for these kinds of ad hoc queries is that I'm getting some text data back, and I don't really care (within reason) how big it is.

    Again, this point is just for ad hoc queries I'm running. For production, I'd be testing for performance and thinking about durability.

    HTH,

    Rich

    I agree with you, yeah if the schema changes then it is better to create the table on the fly but in my case this will be used inside a sproc.thanks

  • Craig Farrell (6/25/2011)


    Eric M Russell (6/25/2011)


    Craig Farrell (6/24/2011)


    ...

    Just to confirm, you're familiar with the fact that a CTE is nothing more then a subquery, and has no optimization value? It's merely there to allow for more readable code unless you're working with recursive CTEs, in which case it has additional functionality.

    ...

    Temp tables, CTEs, and indexed views are different approaches to solving basically the same problem: joining a persisted or interim resultset.

    No, a CTE is not a persisted or interim result set. There is no difference between the following two statements:

    ;WITH cte AS (SELECT id, abc from tbl where x='y')

    SELECT a.col2, cte.abc FROM tbl2 AS a JOIN cte ON a.id = cte.id

    AND

    SELECT a.col2, cte.abc FROM tbl2 AS a JOIN ( SELECT id, abc FROM tbl WHERE x = 'y') AS cte ON a.id = cte.id

    Both are optimized directly into the query plan as subqueries.

    You are right, the result of a CTE is not persisted; it's just a way to make multiple references to a derived table, each with it's own separate execution path. However, it can be substituted for other techniques that would persist a resultset, like a temp table for example. Even though it is not persisted, it can still outperform a joined temp table assuming the base tables are indexed properly, like in my earlier mention of a query that returns the home, cell, and work number for a "small range of customers". In the example below, I'm consistently getting lower CPU and Read counts for the CTE version of the query versus the temp table version.

    With the CTE version, it's more of a straight shot with clustered index seeks and a simple nested loop. It probably wouldn't scale very well past a few dozen customers, but most CRM applications are not interested in displaying more than that many at a time on screen.

    The temp table version takes a hit when the table is inserted, and then the query wants to join using a hash table. However, the temp table technique would probably be best if all customers are to be dumped out into a more complex report.

    create table customer

    ( constraint pk_customer primary key (customer_id),

    customer_id int not null);

    create table customer_phone

    ( constraint pk_customer_phone primary key (customer_id, phone_type),

    customer_id int not null,

    phone_type char(1) not null,

    phone_number varchar(30) not null );

    declare @customer_id int;

    select @customer_id = 1;

    while @customer_id <= 100

    begin

    insert into customer ( customer_id )

    values ( @customer_id );

    insert into customer_phone ( customer_id, phone_type, phone_number )

    values ( @customer_id, 'h', left(newid(),12) );

    insert into customer_phone ( customer_id, phone_type, phone_number )

    values ( @customer_id, 'c', left(newid(),12) );

    insert into customer_phone ( customer_id, phone_type, phone_number )

    values ( @customer_id, 'w', left(newid(),12) );

    select @customer_id = @customer_id + 1;

    end;

    select customer_id, phone_type, phone_number

    into #cp

    from customer_phone

    where customer_id between 40 and 50;

    select

    c.customer_id,

    cph.phone_number as home_phone,

    cpc.phone_number as cell_phone,

    cpw.phone_number as work_phone

    from customer c

    join #cp as cph on cph.customer_id = c.customer_id and cph.phone_type = 'h'

    join #cp as cpc on cpc.customer_id = c.customer_id and cpc.phone_type = 'c'

    join #cp as cpw on cpw.customer_id = c.customer_id and cpw.phone_type = 'w'

    where c.customer_id between 40 and 50;

    drop table #cp;

    ;with cp as (select customer_id, phone_type, phone_number from customer_phone)

    select

    c.customer_id,

    cph.phone_number as home_phone,

    cpc.phone_number as cell_phone,

    cpw.phone_number as work_phone

    from customer c

    join cp as cph on cph.customer_id = c.customer_id and cph.phone_type = 'h'

    join cp as cpc on cpc.customer_id = c.customer_id and cpc.phone_type = 'c'

    join cp as cpw on cpw.customer_id = c.customer_id and cpw.phone_type = 'w'

    where c.customer_id between 40 and 50;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thanks everyone. One thing i have noticed is in "select into" cases i have seen occasional tempdb blockage. After doing some research on web it looks like when doing a select into type it does lock the entire tempdb when creating a table. I will send this as a feedback to dev group to change the way temp tables are created.

  • Eric M Russell (6/27/2011)


    You are right, the result of a CTE is not persisted; it's just a way to make multiple references to a derived table, each with it's own separate execution path. However, it can be substituted for other techniques that would persist a resultset, like a temp table for example. Even though it is not persisted, it can still outperform a joined temp table assuming the base tables are indexed properly, like in my earlier mention of a query that returns the home, cell, and work number for a "small range of customers".

    Now that we're on the same page I agree with you. It was your phrasing I was concerned about and other newbies who might follow along and belive a CTE would persist for optimization. 🙂

    In the example below, I'm consistently getting lower CPU and Read counts for the CTE version of the query versus the temp table version.

    With the CTE version, it's more of a straight shot with clustered index seeks and a simple nested loop. It probably wouldn't scale very well past a few dozen customers, but most CRM applications are not interested in displaying more than that many at a time on screen.

    I agree, and it sounds like someone used a temp table for logical ease then for optimization concerns. Moving to temp tables of any form (not CTEs) needs to be a conscious decision for optimization, not ease of logical use. We're now in complete agreement on what you're saying. 😀


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (6/27/2011)

    ...

    I agree, and it sounds like someone used a temp table for logical ease then for optimization concerns. Moving to temp tables of any form (not CTEs) needs to be a conscious decision for optimization, not ease of logical use. We're now in complete agreement on what you're saying. 😀

    Prior to SQL Server 2005, a many developers (including myself) over-used temporary tables, often times in scenarios where it made the T-SQL cleaner looking. If a derived table was refernced multiple times in a query, it just seemed right to first load the resultset in a temp table. However, it often had no performance optimization benefits and even degraded performance, especially when scaling to hundreds of users. In the case of frequently called OLTP procedure, a derived table was a better choice, and today a CTE can make the same look cleaner. Hopefully a future release of SQL Server engine could even add some performance optimization features for common table expressions when referenced in multiple joins.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Craig Farrell (6/23/2011)


    End result, and if you read the notes you'll see why: No indexing, SELECT INTO. If indexing, fully prebuild it. Test yourself of course, but that's now my rule of thumb.

    Be careful here. While it is true that minimal-logging for INSERTs in 2008 and above can make it competitive with SELECT INTO, in practice, you may prefer SELECT INTO. My reasons? Glad you asked:

    1. SQL 2008 is required to achieve minimal logging. (Without minimal logging, INSERT will be slower).

    2. Second, whether you actually get minimal logging at run time depends on plan shape - specifically the Index Insert must have the DMLRequestSort attribute set to true, and other conditions (such as a table lock) might have to be met too (a table lock is unnecessary on a #temp table of course). See http://msdn.microsoft.com/en-us/library/dd425070.aspx for the full set of circumstances and conditions.

    3. The database must be in SIMPLE or BULK_LOGGED recovery for INSERT to be minimally logged. SELECT INTO always uses bulk loading optimizations, and is minimally logged in SIMPLE and BULK_LOGGED like INSERT, but also fully logged at the PAGE level in FULL recovery - unlike INSERT which logs each row in FULL recovery.

    4. For more complex INSERT sub-trees, the estimated cardinality before the sort is crucial. If the estimate is too low, the sort will very likely spill to physical tempdb disk. When this happens, INSERT performance will suck.

    5. The separate index build has the advantage of knowing exactly how many rows it will encounter, so sorts are much less likely to spill.

    6. The Index Insert in the INSERT plan always runs on a single thread. The separate index build may use parallelism.

    So, the practical issues are that you have to (a) ensure you get minimal logging with the INSERT; and (b) ensure that sort cardinality estimation is accurate. The second of these can be extremely problematic in practical situations (complex plans tend to have less accurate cardinality estimates higher up the query tree, regardless of FULLSCAN statistics or anything else).

    Bottom line, SELECT INTO has great performance, but more importantly, it has predictable performance. For that reason alone, I usually prefer SELECT INTO + index build in production code.

Viewing 15 posts - 46 through 59 (of 59 total)

You must be logged in to reply to this topic. Login to reply