RE: Temp Tables in SQL Server

SSC Guru

Points: 1003853

July 20, 2006 at 10:58 pm

Just following up... here's a cool problem that I just ran across that I was able to answer because I'd done something like it before...

Here's the problem...

You have a payment table that accumulates payment records for a week at a time... the agreement you have with the bank clearing house is that you can send as many files as you want with as many records as you want but you can only send the files once a week and no file shall have any more than 1,000,000 dollars worth payments in it. Because business has been good, this week's payment table has 200,000 records in it ranging from 1 dollar to 100 dollars. Determine which rows you will send as the first file by PK (an INT IDENTITY column for this example). Additionaly, you must provide the PK, the payment amount, and a running total in sorted order because the bank wants it that way in order for your company to be able to actually get paid.

There's only two restrictions.. you cannot use an external app because there isn't one and the company doesn't want to spend the resources to build/test one (YOU will get fired if you do make one even on your own time because your boss is a jerk). The other is that you cannot mod the payment table because the 3rd party vendor that wrote it says it will break their app if you add a column to it because they used some type of schema binding in their compiled code to prevent tampering with the tables they made.

I'll even provide the test data...

--============================================================================

-- Create a test table

--============================================================================

--===== If the test exists, drop it so we can rerun if we need to

IF OBJECT_ID('dbo.jbmRunTotTest') IS NOT NULL

DROP TABLE dbo.jbmRunTotTest

--===== Suppress auto-display of rowcounts for appearance and speed

SET NOCOUNT ON

--===== Create and populate the jbmRunTotTest table on the fly

SELECT TOP 200000

IDENTITY(INT,1,5) AS PK,

CAST(RAND(CAST(NEWID() AS VARBINARY))*99+1 AS DECIMAL(7,2)) AS ColA

INTO dbo.jbmRunTotTest

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance

ALTER TABLE dbo.jbmRunTotTest

ADD CONSTRAINT PK_jbmRunTotTest_PK PRIMARY KEY CLUSTERED (PK)

--===== Allow the general public to use it

GRANT ALL ON dbo.jbmRunTotTest TO PUBLIC

So, how to solve this one? A self-joined correlated sub-query would take about a month (took 29 seconds on only 9,999 records and get's exponentially worse). Hmmmm.... maybe we'll have to resort to a RBAR WHILE loop or even a Cursor... nah...

Again, we just want to return one batch of ID's that contain no more than 1,000,000 dollars worth of payments (we'll worry about the multiple batches some other time)... here's a nasty fast way to do it with a temp table and a little help from SQL Server's proprietary UPDATE statement...

--============================================================================

-- Return all PK's (other cols returned for bank proof) that have a running

-- total less than some predetermined number (1 batch)

--============================================================================

--===== If the working temp table already exists, drop it for reruns

IF OBJECT_ID('TempDB..#Temp') IS NOT NULL

DROP TABLE #Temp

--===== Clear any cached data so we can get a true measure of how fast this is

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

--===== Declare local variables

DECLARE @StartTime DATETIME --< Just to measure run duration

SET @StartTime = GETDATE()

DECLARE @MaxVal INT

SET @MaxVal = 1000000 --< This is the predetermined max running total

DECLARE @RunTot DECIMAL(38,2) --< This let's us calc the running total

SET @RunTot = 0

--===== Create and populate the temporary working table on the fly

SELECT IDENTITY(INT,1,1) AS RowNum,

CAST(PK AS INT) AS PK, --< The Primary Key Col from your table

ColA, --< The column that contains the value from your table

CAST(0 AS DECIMAL(38,2)) AS RunTotal --< To be calculated

INTO #Temp --< The new temporary working table

FROM jbmRunTotTest t1 --< Our test table (sub your table name here)

--===== Add a new clustered Primary Key because every table should have one

-- because UPDATE doesn't like an ORDER BY

ALTER TABLE #Temp

ADD PRIMARY KEY CLUSTERED (RowNum)

--===== Update the temporary working table with the running total in RowNum order

UPDATE t1

SET @RunTot = RunTotal = @RunTot+t1.ColA

FROM #Temp t1

--===== Select a batch of rows where the running total is less than the predetermined max

SELECT *

FROM #Temp

WHERE RunTotal <= @MaxVal

--===== All done, display the duration

PRINT CONVERT(VARCHAR(12),DATEADD(ms,DATEDIFF(ms,@StartTime,GETDATE()),0),114)

+ ' Run Duration (hh:mm:ss:mil)'

Is it bad design where I had to do a work around? I don't think so... it's just everyday normal stuff. The use of the temp table allows this process to find the first batch of ID's and the other required info in only 3.36 seconds on a single CPU at 1.8Mhz with 1 IDE hard drive and only 2 gig of memory. Now I don't know about you but I'm thinking that, because of the restrictions, a temp table was the only way to go. And, I'll go a little further and say that I think it's going to be tough for anyone to beat the performance the temp table allowed the code to run with.

But that could be just a temporary thing

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)