Is it "better" to specify the definition of a temporary table, or just define it on the fly?

  • The subject line says it all, but just to make it clearer...

    I could use a CREATE TABLE #TempTable to define the definition of a table, but I'm wondering if it even matters? Most code I've seen, using temporary tables, do something like this:

    INSERT INTO #TempTable

    SELECT CustomerID, LastName, FirstName FROM

    SomeTable

    So, does it really matter?

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Hi,

    For known column I use create table and then insert

    For creating a table from another table schema I use insert into.

    Have used both, but honestly i prefer 'insert into' because then you dont have to know all columns before insertion.

    Here, I guess it doesnt matter which one you use.

    🙂

  • There's a small confusion.

    INSERT INTO, won't create a table, SELECT INTO will. Example:

    --This will create a table

    SELECT column1, column2, column3

    INTO #TempTable

    FROM MyTable

    --This won't

    INSERT INTO #TempTable

    SELECT column1, column2, column3

    FROM MyTable

    Now, to answer the question. IMHO, It depends. If you don't care on the structure of your table and need something easy, you can use SELECT INTO. However, creating the table before inserting data will give you more flexibility and control on it's structure.

    What happens if you need more columns than the ones that you'll have available in the select? What happens if you want an exact copy of the columns used in a query?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (10/30/2012)


    What happens if you need more columns than the ones that you'll have available in the select? What happens if you want an exact copy of the columns used in a query?

    That's an easy thing to do and you have much more control over what the columns may be like than you might imagine.

    For example, here's the final SELECT from a CTE that I use to begin the process of converting an Adjacency List to Nested Sets (article comes out Nov 13th, 2012).

    SELECT EmployeeID = ISNULL(sorted.EmployeeID,0),

    sorted.ManagerID,

    HLevel = ISNULL(sorted.HLevel,0),

    LeftBower = ISNULL(CAST(0 AS INT),0), --Place holder

    RightBower = ISNULL(CAST(0 AS INT),0), --Place holder

    NodeNumber = ISNULL(ROW_NUMBER() OVER (ORDER BY sorted.SortPath),0),

    NodeCount = ISNULL(CAST(0 AS INT),0), --Place holder

    SortPath = ISNULL(sorted.SortPath,sorted.SortPath)

    INTO dbo.Hierarchy

    FROM cteBuildPath AS sorted

    First, all of the ISNULL's make for NOT NULL columns in the final table. That's especially important for the EmployeeID column because it will become the PK for this table using a separate piece of code.

    The other thing to notice is that it also creates 3 "place holder" columns in the table for which I no data for at this point in the process. It even controls what the datatype will be for those columns using CAST.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I did a bit of testing on this a while ago, and could probably recreate the scenarios, but here's where I ended up.

    SELECT INTO #tmp

    and

    CREATE TABLE #Tmp INSERT INTO #tmp SELECT

    were functionally equivalent timewise at a million row test with varying degrees of spacer columns ( (REPLICATE('@',200) for example). The concern comes in after that, where any temp that size almost requires an index.

    CREATE TABLE #Tmp CREATE INDEX idx1 INSERT INTO #tmp SELECT

    runs about 1.5x - 2x faster (depending on the index) as

    SELECT INTO #tmp CREATE INDEX idx1

    This is voodoo on a large scale as I can't give you exact reasons why, but I get it generally. The heap being indexed after is more expensive than the pre-sort into the index.

    You can make either version do whatever you want, but in my personal preference and practice is for permanent procs I always declare the #TMP directly, then insert into it. If I'm doing one-off adhocs trying to research up some data, I'll swap between the two depending on how many columns are involved until I nail down my requirements.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Luis Cazares (10/30/2012)


    I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.

    Why?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/30/2012)


    Luis Cazares (10/30/2012)


    I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.

    Why?

    Why? Because I get paid by line of code. Ok, that's not true.

    The reason is much similar to comments, it makes code more readable.

    I won't use Craig post but that would have been a good reason if I knew it before.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I have to say that it truly depends. For example, I typically only put into a Temp Table exactly what I need. Since I usually end up using the whole table, the best an index would do would be a SCAN instead of a SEEK. With that thought in mind, I frequently don't put any kind of an index on TempTables... not even a PK or CI. In fact, adding an index will sometimes cause queries against the Temp Table to run slower... a lot slower. I just went through such a thing with the code for an article coming out soon. With the Clustered Index that seemed totally logical to have, the code ran in just over 2 minutes. Without the index, the code ran in 54 seconds... nearly twice as fast.

    To wit, on a million row table on my humble i5/6GB laptop, the SELECT/INTO edges out the CREATE TABLE by a couple of hundred milliseconds when I don't create an index. That's not the important part to me because it's such a very small difference. What's important to me is that I don't have to write the code to create the table to begin with and that code can sometimes be fairly extensive. If the table is a #Temp Table, then I'm not normally doing any datatyping or making columns NOT NULL with ISNULL. It's all very fast to type because I only need to add two words to a query to get it to build the table... INTO tablename.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My apologies. I forgot to post the code for my performance claim.

    --http://www.sqlservercentral.com/Forums/Topic1378984-149-1.aspx?Update=1

    DECLARE @StartTime DATETIME;

    RAISERROR('===============================================================',0,1) WITH NOWAIT;

    RAISERROR('========== SELECT/INTO',0,1) WITH NOWAIT;

    RAISERROR('===============================================================',0,1) WITH NOWAIT;

    DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS ;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS ;

    SELECT @StartTime = GETDATE();

    SELECT TOP (1000000)

    SomeID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10',

    SomeMoney = ABS(CHECKSUM(NEWID()))%10000 /100.0,

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex09 = RIGHT(NEWID(),09)

    INTO #JBMTest

    FROM sys.all_columns t1

    CROSS JOIN sys.all_columns t2

    ;

    --===== Add the PK

    -- Takes about 1 second to execute.

    -- ALTER TABLE #JBMTest

    -- ADD PRIMARY KEY CLUSTERED (SomeID)

    --;

    --===== Duration and HouseKeeping

    PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    DROP TABLE #JBMTest

    ;

    GO 5

    DECLARE @StartTime DATETIME;

    RAISERROR('===============================================================',0,1) WITH NOWAIT;

    RAISERROR('========== Precreate Table with Index',0,1) WITH NOWAIT;

    RAISERROR('===============================================================',0,1) WITH NOWAIT;

    DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS ;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS ;

    SELECT @StartTime = GETDATE();

    CREATE TABLE [dbo].[#JBMTest](

    [SomeID] [int] IDENTITY(1,1) NOT NULL,

    [SomeInt] [int] NULL,

    [SomeLetters2] [char](2) NULL,

    [SomeCSV] [varchar](80) NULL,

    [SomeMoney] [money] NULL,

    [SomeDate] [datetime] NULL,

    [SomeHex09] [char](9) NULL,

    --PRIMARY KEY CLUSTERED

    --(

    --[SomeID] ASC

    --)

    )

    ;

    INSERT INTO #JBMTest

    SELECT TOP (1000000)

    --SomeID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10',

    SomeMoney = ABS(CHECKSUM(NEWID()))%10000 /100.0,

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex09 = RIGHT(NEWID(),09)

    FROM sys.all_columns t1

    CROSS JOIN sys.all_columns t2

    ;

    --===== Duration and HouseKeeping

    PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    DROP TABLE #JBMTest

    ;

    GO 5

    RAISERROR('===============================================================',0,1) WITH NOWAIT;

    Here are the run results in my i5 laptop. 5 runs each. The test code clears cache at the system level so make sure you don't run it on a production box.

    Beginning execution loop

    ===============================================================

    ========== SELECT/INTO

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:03:460

    ===============================================================

    ========== SELECT/INTO

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:03:480

    ===============================================================

    ========== SELECT/INTO

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:03:647

    ===============================================================

    ========== SELECT/INTO

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:03:467

    ===============================================================

    ========== SELECT/INTO

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:03:497

    Batch execution completed 5 times.

    Beginning execution loop

    ===============================================================

    ========== Precreate Table with Index

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:04:070

    ===============================================================

    ========== Precreate Table with Index

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:04:177

    ===============================================================

    ========== Precreate Table with Index

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:04:113

    ===============================================================

    ========== Precreate Table with Index

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:04:123

    ===============================================================

    ========== Precreate Table with Index

    ===============================================================

    (1000000 row(s) affected)

    Duration = 00:00:03:997

    Batch execution completed 5 times.

    ===============================================================

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WOW, lots of great feedback here, thank you everyone!

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Jeff Moden,

    Please clarify this subject for me:

    I'm almost certain that I read somewhere that a "SELECT .... INTO #Table" would automatically create the target table, then lock the TempDB database for the duration of the data insertions.

    Is this true or is it incorrect information?

    Thanks,

  • Completely false.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Lee Crain (11/6/2012)


    Jeff Moden,

    Please clarify this subject for me:

    I'm almost certain that I read somewhere that a "SELECT .... INTO #Table" would automatically create the target table, then lock the TempDB database for the duration of the data insertions.

    Is this true or is it incorrect information?

    Thanks,

    Used to be partially true in SQL 7.5 and before. Schema locks on tempdb were needed in order to create temp tables back then. In any version of SQL Server from SQL 2000 or later, that is no longer true. But even in the older versions, the locks only lasted as long as it took to create the table, not for the duration of the inserts, if I remember correctly.

    On the main subject, I use Select Into when the structure of the temp table is subject to change without notice, or when I'm trying to debug truncation issues and so on. I generally use Create...Insert...Select for temp tables where I want to control the structure of the temp table. In either case, I usually add a conditional Drop Table before the create step.

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T;

    CREATE TABLE #T

    (ID INT IDENTITY

    PRIMARY KEY,

    MyColumn1 INT NOT NULL);

    INSERT INTO #T

    (MyColumn1)

    VALUES (1),

    (2),

    (5);

    Like that.

    Makes it so the script can be run repeatedly for debugging/testing/refactoring.

    Breaking it up that way makes it more "self-documenting", it my opinion. I find it much easier to overlook an "INTO" in a Select statement, than to overlook a "CREATE TABLE" with some whitespace around it.

    Plus habits from SQL 2000 and before, where mixing DDL and DML could cause performance issues - not as important any more, but it's a habit and I haven't bothered to change it.

    So I tend towards explicitly creating temp tables.

    One very useful application of Select Into, though, is when you get a truncation error from an ETL process that was working before. Use Select Into, pull the structure of the resulting table, compare it to your target table, and the column with a longer max_length will usually leap right out. If you name the columns the same as the target table, you can even do an Outer Join on the two sets of column definitions and find where the temp table has a bigger scope. Speeds up debugging that kind of thing marvelously when you have a dozen or more varchar columns, and all you get from the SQL engine is "string or binary data would be truncated", with no indication of what column/row you need to look at.

    There are lots of other little shortcuts like that where Select Into can save a lot of time and work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks, Gail, and thanks, G-Squared, especially for the details regarding debugging SSIS problems.

    SSIS is not one of my strong points. I'm going to tuck that advice away for future reference.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply