Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345

Temp Tables in SQL Server Expand / Collapse
Author
Message
Posted Wednesday, July 19, 2006 1:14 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, January 6, 2014 10:36 AM
Points: 104, Visits: 98

temp table >>> cursor

Cursors are the method of last resort.




Post #295639
Posted Wednesday, July 19, 2006 6:29 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:20 PM
Points: 35,552, Visits: 32,148

I'm thinking that's likely true because, as someone else pointed out, the TEMPDB database is set to the "SIMPLE" recovery mode.  However, most other operations on temp tables are logged... they're quickly truncated from the log but they are logged... again, don't take my word for it... found this snippet on http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx

 

Space required for tempdb logging

Most operations in tempdb are not logged. Operations on temporary tables, table variables, table-valued functions, and user-defined tables are logged. These are called user objects. Sort operations in tempdb also require logging for activities related to page allocation. The log size requirement depends on two factors: how long it is necessary to keep the log and how many log records are generated during this time.

Estimating the log space for tempdb is similar to estimating the log file size for other databases. However, when you estimate the tempdb log size, you pay attention only to operations that generate log records. For example, the longest running transaction in tempdb is an index build operation that uses sort. This transaction ran for five hours. An estimated 1,000 temporary table records were fully overwritten per minute by updates. The average record size is 1 KB. The calculation for the log space required is 5*60*1000*1KB = 300MB.  

Since log truncation can become a contention point in a heavily loaded system with a small log file, make the tempdb log file large enough to avoid truncating logs frequently. For example, if 50 MB of the log file can be generated per second in the system, the log size should be at least 500 MB so that a log truncation is performed about every 10 seconds.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #295734
Posted Wednesday, July 19, 2006 7:01 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:20 PM
Points: 35,552, Visits: 32,148

I have to admit... I've never seen an ideally designed database  (usually, too many cooks in the design with too little knowledge, you know how it goes) so I can't argue that point either pro or con.

What I have found, though, is that many folks will join 10 or 12 tables trying get all of the data for a large update all at once... dividing the problem into smaller more managable set based pieces using temp tables has allowed me to convert other folks 8-10 hour runs to 8-10 minute runs.  Is it a work-around for poor design?  Dunno for sure (probably)... all I know is that by storing only the data I need to work with in a temp table, doing 1 or more updates on the (possibly denormalized) data, and then firing one quick update with only one join between the temp table and the target table has shaved many hours from many sprocs and, because the update runs so fast, has actually contributed to substantial decreases in the number of deadlocks we were experiencing.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #295743
Posted Thursday, July 20, 2006 10:58 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:20 PM
Points: 35,552, Visits: 32,148

Just following up... here's a cool problem that I just ran across that I was able to answer because I'd done something like it before...

Here's the problem...

You have a payment table that accumulates payment records for a week at a time... the agreement you have with the bank clearing house is that you can send as many files as you want with as many records as you want but you can only send the files once a week and no file shall have any more than 1,000,000 dollars worth payments in it.  Because business has been good, this week's payment table has 200,000 records in it ranging from 1 dollar to 100 dollars.  Determine which rows you will send as the first file by PK (an INT IDENTITY column for this example).  Additionaly, you must provide the PK, the payment amount, and a running total in sorted order because the bank wants it that way in order for your company to be able to actually get paid.

There's only two restrictions.. you cannot use an external app because there isn't one and the company doesn't want to spend the resources to build/test one (YOU will get fired if you do make one even on your own time because your boss is a jerk).  The other is that you cannot mod the payment table because the 3rd party vendor that wrote it says it will break their app if you add a column to it because they used some type of schema binding in their compiled code to prevent tampering with the tables they made.

I'll even provide the test data...

--============================================================================
-- Create a test table
--============================================================================
--===== If the test exists, drop it so we can rerun if we need to
     IF OBJECT_ID('dbo.jbmRunTotTest') IS NOT NULL
        DROP TABLE dbo.jbmRunTotTest

--===== Suppress auto-display of rowcounts for appearance and speed
    SET NOCOUNT ON

--===== Create and populate the jbmRunTotTest table on the fly
 SELECT TOP 200000 
        IDENTITY(INT,1,5) AS PK,
        CAST(RAND(CAST(NEWID() AS VARBINARY))*99+1 AS DECIMAL(7,2)) AS ColA
   INTO dbo.jbmRunTotTest
   FROM Master.dbo.SysColumns sc1,
        Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance
  ALTER TABLE dbo.jbmRunTotTest
    ADD CONSTRAINT PK_jbmRunTotTest_PK PRIMARY KEY CLUSTERED (PK)

--===== Allow the general public to use it
  GRANT ALL ON dbo.jbmRunTotTest TO PUBLIC

So, how to solve this one?  A self-joined correlated sub-query would take about a month (took 29 seconds on only 9,999 records and get's exponentially worse).  Hmmmm.... maybe we'll have to resort to a RBAR WHILE loop or even a Cursor... nah...

Again, we just want to return one batch of ID's that contain no more than 1,000,000 dollars worth of payments (we'll worry about the multiple batches some other time)... here's a nasty fast way to do it with a temp table and a little help from SQL Server's proprietary UPDATE statement...

--============================================================================
-- Return all PK's (other cols returned for bank proof) that have a running
-- total less than some predetermined number (1 batch)
--============================================================================

--===== If the working temp table already exists, drop it for reruns
     IF OBJECT_ID('TempDB..#Temp') IS NOT NULL
        DROP TABLE #Temp

--===== Clear any cached data so we can get a true measure of how fast this is
   DBCC DROPCLEANBUFFERS
   DBCC FREEPROCCACHE

--===== Declare local variables
DECLARE @StartTime DATETIME --< Just to measure run duration
    SET @StartTime = GETDATE()
DECLARE @MaxVal INT
    SET @MaxVal = 1000000 --< This is the predetermined max running total
DECLARE @RunTot DECIMAL(38,2)   --< This let's us calc the running total
    SET @RunTot = 0

--===== Create and populate the temporary working table on the fly
 SELECT IDENTITY(INT,1,1) AS RowNum,
        CAST(PK AS INT) AS PK, --< The Primary Key Col from your table
        ColA, --< The column that contains the value from your table
        CAST(0 AS DECIMAL(38,2)) AS RunTotal --< To be calculated
   INTO #Temp --< The new temporary working table
   FROM jbmRunTotTest t1 --< Our test table (sub your table name here)

--===== Add a new clustered Primary Key because every table should have one
     -- because UPDATE doesn't like an ORDER BY
  ALTER TABLE #Temp
    ADD PRIMARY KEY CLUSTERED (RowNum)

--===== Update the temporary working table with the running total in RowNum order
 UPDATE t1
    SET @RunTot = RunTotal = @RunTot+t1.ColA
   FROM #Temp t1

--===== Select a batch of rows where the running total is less than the predetermined max
 SELECT *
   FROM #Temp
  WHERE RunTotal <= @MaxVal

--===== All done, display the duration
  PRINT CONVERT(VARCHAR(12),DATEADD(ms,DATEDIFF(ms,@StartTime,GETDATE()),0),114)
      + ' Run Duration (hh:mm:ss:mil)'

Is it bad design where I had to do a work around?  I don't think so... it's just everyday normal stuff.  The use of the temp table allows this process to find the first batch of ID's and the other required info in only 3.36 seconds on a single CPU at 1.8Mhz with 1 IDE hard drive and only 2 gig of memory.  Now I don't know about you but I'm thinking that, because of the restrictions, a temp table was the only way to go.  And, I'll go a little further and say that I think it's going to be tough for anyone to beat the performance the temp table allowed the code to run with.

But that could be just a temporary thing



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #296140
Posted Monday, July 16, 2007 6:59 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 17, 2014 2:10 PM
Points: 153, Visits: 570

I'm reading this from a "best of last year' posting, and looking at the number of replies and the dates of same, this article has been around for a while. So if you're reading this, you're a very detailed person.

My reply to all of this:

Good news: This is a nice summary with helpful thoughts.

Bad news: What's the point of using temp tables if all of these thoughts are true? Half of the point of using them is that the system deletes them when you're done, and you don't wind up with thousands of xxxx_temp tables gumming up your system. If you have to delete them yourself, why not just create a standard table and then delete it when you're done?

Also, half the fun of temp tables is "select into". Again, the point is that they're supposed to be quick. If you have to create tables using DDL, you might as well create a standard table somewhere with _temp on the end of the name, and just be careful to document why you're using it (and/or give it a meaningful name, no "123ABC_temp" nonsense) so the system doesn't get gummed up with useless junk. Just delete rows when you're done. If "select into" uses a lot of memory for large recordsets, DON'T USE IT WITH LARGE RECORDSETS. And while you're at it, maybe temp tables just aren't a good idea with large recordsets. If you have to deal with that amount of data on a regular basis, a properly named and documented standard table will be much more efficient.



___________________________________________________
“Politicians are like diapers. They both need changing regularly and for the same reason.”
Post #381920
Posted Monday, July 16, 2007 5:43 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:20 PM
Points: 35,552, Visits: 32,148

Heh... I don't know where they came up with the idea that SELECT INTO is slow... especially if the target DB has no triggers or keys and especially when the target is a Temp table that lives in TempDB with the SIMPLE recovery mode set...

SELECT INTO enjoys the same mimimally logged and non-logged benefits in TempDB (again, SIMPLE recovery mode) as does Bulk Insert and BCP provided the following are true (from Books Online) ...

  • The recovery model is simple or bulk-logged.
  • The target table is not being replicated.
  • The target table does not have any triggers.
  • The target table has either 0 rows or no indexes.
  • The TABLOCK hint is specified. 
  • I will agree that a very large SELECT INTO will keep a lock on one of the system tables and will keep similar ops from running until it is complete. But, even for a million rows, if the query is quick, there won't be much interference (should be scant seconds).



    --Jeff Moden
    "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

    (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

    Helpful Links:
    How to post code problems
    How to post performance problems
    Post #382157
    « Prev Topic | Next Topic »

    Add to briefcase «««12345

    Permissions Expand / Collapse