Temp Tables in SQL Server

  • Jeff,

    Excellent comments and references. Thanks for that!!

  • I believed, as it is in Sybase, that SELECT INTO avoided logging operations : all the rows created by select into are not written to the tempdb log. If it is the same in SQL Server, the true advantage of the SELECT INTO is here, not elsewhere

  • Good point... but you have to know what the Recovery Model is set to.  If it's set to FULL, even SELECT INTO is logged.  Again, don't take my word for it... here's a snippet from Books OnLine on the subject of "Recovery Models, Full Recovery"...

    Full Recovery

    The Full Recovery model uses database backups and transaction log backups to provide complete protection against media failure. If one or more data files is damaged, media recovery can restore all committed transactions. In-process transactions are rolled back.
    Full Recovery provides the ability to recover the database to the point of failure or to a specific point in time. To guarantee this degree of recoverability, all operations, including bulk operations such as SELECT INTO, CREATE INDEX, and bulk loading data, are fully logged.

    In SQL Server 2000, you can temporarily change the recovery model to "BULK LOGGED".  In both SQL Server 2000 and 7.x, you can temporarily change the SELECT INTO/BULKCOPY database option using sp_DBOption... in fact, in 6.5 and 7.x, you must enable this setting for SELECT INTO to even work... again, don't take my word for it... here's a snippet from Microsoft and the URL where it came from...

    In releases before SQL Server 2000, creating a permanent table with SELECT INTO is allowed only if select into/bulkcopy is set. In SQL Server 2000 and later, the select into/bulkcopy database option no longer affects whether you can create a permanent table by using SELECT INTO.

    http://msdn2.microsoft.com/en-us/library/ms188029.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One reason that temporary tables are quicker than table variables is because SQL Server may take advantage of parallelism in the generated query for temp table. Hence, if you have a large temp table, you may get high performance for your query.

  • Does anyone have a Microsoft reference or test code to support that one?  I'm kinda in the "Myth Buster" mode at work about Temp Tables and Table Variables... anything you could post would sure be appreciated.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I cannot believe when I read "SELECT INTO is BAD"

    SELECT INTO is one of BULK operation.

    In case of large datasets there are no other way then to use one or other BULK operation.

    (Don't forget DTS is also a BULK operation. I mean Fast Load =)

  • I think it is a wrong way to create "temp" tables in user databases, especially is the database is in the full reco mode because the transactional coherency is lost (muse perform a full backup after)

    Tempdb has been designed to store temporary table, i.e. throwable table that can be list without any risk for applications. So every temp table should be created into tempdb, and the recovery mode is simple, so no logging and strong performances, better than ddl + insert/select, in any case.

  • To avoid to mislead all the readers of this article, I think it can be clearly claimed that in any case, for fast temp table creation, SELECT INTO is the best way to perform that.

  • it's also important to note that temp tables are a sign of ineffective design and workarounds (usually performance related). In an ideally designed database (sitting on an ideally designed db server) there would be absolutely no need for temporary tables.

    Don't get me wrong; I use them when I have to, but I don't like them. Kind of like cursors.

    Signature is NULL

  • The operations you are speaking of are for permanent tables.  Select into a temp table should cause no logging to occur.  That is what we are speaking about in this thread, right?

  • temp table >>> cursor

    Cursors are the method of last resort.

  • I'm thinking that's likely true because, as someone else pointed out, the TEMPDB database is set to the "SIMPLE" recovery mode.  However, most other operations on temp tables are logged... they're quickly truncated from the log but they are logged... again, don't take my word for it... found this snippet on http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx

     

    Space required for tempdb logging

    Most operations in tempdb are not logged. Operations on temporary tables, table variables, table-valued functions, and user-defined tables are logged. These are called user objects. Sort operations in tempdb also require logging for activities related to page allocation. The log size requirement depends on two factors: how long it is necessary to keep the log and how many log records are generated during this time.

    Estimating the log space for tempdb is similar to estimating the log file size for other databases. However, when you estimate the tempdb log size, you pay attention only to operations that generate log records. For example, the longest running transaction in tempdb is an index build operation that uses sort. This transaction ran for five hours. An estimated 1,000 temporary table records were fully overwritten per minute by updates. The average record size is 1 KB. The calculation for the log space required is 5*60*1000*1KB = 300MB.  

    Since log truncation can become a contention point in a heavily loaded system with a small log file, make the tempdb log file large enough to avoid truncating logs frequently. For example, if 50 MB of the log file can be generated per second in the system, the log size should be at least 500 MB so that a log truncation is performed about every 10 seconds.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have to admit... I've never seen an ideally designed database  (usually, too many cooks in the design with too little knowledge, you know how it goes) so I can't argue that point either pro or con.

    What I have found, though, is that many folks will join 10 or 12 tables trying get all of the data for a large update all at once... dividing the problem into smaller more managable set based pieces using temp tables has allowed me to convert other folks 8-10 hour runs to 8-10 minute runs.  Is it a work-around for poor design?  Dunno for sure (probably)... all I know is that by storing only the data I need to work with in a temp table, doing 1 or more updates on the (possibly denormalized) data, and then firing one quick update with only one join between the temp table and the target table has shaved many hours from many sprocs and, because the update runs so fast, has actually contributed to substantial decreases in the number of deadlocks we were experiencing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just following up... here's a cool problem that I just ran across that I was able to answer because I'd done something like it before...

    Here's the problem...

    You have a payment table that accumulates payment records for a week at a time... the agreement you have with the bank clearing house is that you can send as many files as you want with as many records as you want but you can only send the files once a week and no file shall have any more than 1,000,000 dollars worth payments in it.  Because business has been good, this week's payment table has 200,000 records in it ranging from 1 dollar to 100 dollars.  Determine which rows you will send as the first file by PK (an INT IDENTITY column for this example).  Additionaly, you must provide the PK, the payment amount, and a running total in sorted order because the bank wants it that way in order for your company to be able to actually get paid.

    There's only two restrictions.. you cannot use an external app because there isn't one and the company doesn't want to spend the resources to build/test one (YOU will get fired if you do make one even on your own time because your boss is a jerk).  The other is that you cannot mod the payment table because the 3rd party vendor that wrote it says it will break their app if you add a column to it because they used some type of schema binding in their compiled code to prevent tampering with the tables they made.

    I'll even provide the test data...

    --============================================================================

    -- Create a test table

    --============================================================================

    --===== If the test exists, drop it so we can rerun if we need to

         IF OBJECT_ID('dbo.jbmRunTotTest') IS NOT NULL

            DROP TABLE dbo.jbmRunTotTest

    --===== Suppress auto-display of rowcounts for appearance and speed

        SET NOCOUNT ON

    --===== Create and populate the jbmRunTotTest table on the fly

     SELECT TOP 200000 

            IDENTITY(INT,1,5) AS PK,

            CAST(RAND(CAST(NEWID() AS VARBINARY))*99+1 AS DECIMAL(7,2)) AS ColA

       INTO dbo.jbmRunTotTest

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

      ALTER TABLE dbo.jbmRunTotTest

        ADD CONSTRAINT PK_jbmRunTotTest_PK PRIMARY KEY CLUSTERED (PK)

    --===== Allow the general public to use it

      GRANT ALL ON dbo.jbmRunTotTest TO PUBLIC

    So, how to solve this one?  A self-joined correlated sub-query would take about a month (took 29 seconds on only 9,999 records and get's exponentially worse).  Hmmmm.... maybe we'll have to resort to a RBAR WHILE loop or even a Cursor... nah...

    Again, we just want to return one batch of ID's that contain no more than 1,000,000 dollars worth of payments (we'll worry about the multiple batches some other time)... here's a nasty fast way to do it with a temp table and a little help from SQL Server's proprietary UPDATE statement...

    --============================================================================

    -- Return all PK's (other cols returned for bank proof) that have a running

    -- total less than some predetermined number (1 batch)

    --============================================================================

    --===== If the working temp table already exists, drop it for reruns

         IF OBJECT_ID('TempDB..#Temp') IS NOT NULL

            DROP TABLE #Temp

    --===== Clear any cached data so we can get a true measure of how fast this is

       DBCC DROPCLEANBUFFERS

       DBCC FREEPROCCACHE

    --===== Declare local variables

    DECLARE @StartTime DATETIME --< Just to measure run duration

        SET @StartTime = GETDATE()

    DECLARE @MaxVal INT

        SET @MaxVal = 1000000 --< This is the predetermined max running total

    DECLARE @RunTot DECIMAL(38,2)   --< This let's us calc the running total

        SET @RunTot = 0

    --===== Create and populate the temporary working table on the fly

     SELECT IDENTITY(INT,1,1) AS RowNum,

            CAST(PK AS INT) AS PK, --< The Primary Key Col from your table

            ColA, --< The column that contains the value from your table

            CAST(0 AS DECIMAL(38,2)) AS RunTotal --< To be calculated

       INTO #Temp --< The new temporary working table

       FROM jbmRunTotTest t1 --< Our test table (sub your table name here)

    --===== Add a new clustered Primary Key because every table should have one

         -- because UPDATE doesn't like an ORDER BY

      ALTER TABLE #Temp

        ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Update the temporary working table with the running total in RowNum order

     UPDATE t1

        SET @RunTot = RunTotal = @RunTot+t1.ColA

       FROM #Temp t1

    --===== Select a batch of rows where the running total is less than the predetermined max

     SELECT *

       FROM #Temp

      WHERE RunTotal <= @MaxVal

    --===== All done, display the duration

      PRINT CONVERT(VARCHAR(12),DATEADD(ms,DATEDIFF(ms,@StartTime,GETDATE()),0),114)

          + ' Run Duration (hh:mm:ss:mil)'

    Is it bad design where I had to do a work around?  I don't think so... it's just everyday normal stuff.  The use of the temp table allows this process to find the first batch of ID's and the other required info in only 3.36 seconds on a single CPU at 1.8Mhz with 1 IDE hard drive and only 2 gig of memory.  Now I don't know about you but I'm thinking that, because of the restrictions, a temp table was the only way to go.  And, I'll go a little further and say that I think it's going to be tough for anyone to beat the performance the temp table allowed the code to run with.

    But that could be just a temporary thing

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm reading this from a "best of last year' posting, and looking at the number of replies and the dates of same, this article has been around for a while. So if you're reading this, you're a very detailed person.

    My reply to all of this:

    Good news: This is a nice summary with helpful thoughts.

    Bad news: What's the point of using temp tables if all of these thoughts are true? Half of the point of using them is that the system deletes them when you're done, and you don't wind up with thousands of xxxx_temp tables gumming up your system. If you have to delete them yourself, why not just create a standard table and then delete it when you're done?

    Also, half the fun of temp tables is "select into". Again, the point is that they're supposed to be quick. If you have to create tables using DDL, you might as well create a standard table somewhere with _temp on the end of the name, and just be careful to document why you're using it (and/or give it a meaningful name, no "123ABC_temp" nonsense) so the system doesn't get gummed up with useless junk. Just delete rows when you're done. If "select into" uses a lot of memory for large recordsets, DON'T USE IT WITH LARGE RECORDSETS. And while you're at it, maybe temp tables just aren't a good idea with large recordsets. If you have to deal with that amount of data on a regular basis, a properly named and documented standard table will be much more efficient.

    ___________________________________________________
    “Politicians are like diapers. They both need changing regularly and for the same reason.”

Viewing 15 posts - 31 through 45 (of 45 total)

You must be logged in to reply to this topic. Login to reply