Just following up... here's a cool problem that I just ran across that I was able to answer because I'd done something like it before...
Here's the problem...
You have a payment table that accumulates payment records for a week at a time... the agreement you have with the bank clearing house is that you can send as many files as you want with as many records as you want but you can only send the files once a week and no file shall have any more than 1,000,000 dollars worth payments in it. Because business has been good, this week's payment table has 200,000 records in it ranging from 1 dollar to 100 dollars. Determine which rows you will send as the first file by PK (an INT IDENTITY column for this example). Additionaly, you must provide the PK, the payment amount, and a running total in sorted order because the bank wants it that way in order for your company to be able to actually get paid.
There's only two restrictions.. you cannot use an external app because there isn't one and the company doesn't want to spend the resources to build/test one (YOU will get fired if you do make one even on your own time because your boss is a jerk). The other is that you cannot mod the payment table because the 3rd party vendor that wrote it says it will break their app if you add a column to it because they used some type of schema binding in their compiled code to prevent tampering with the tables they made.
I'll even provide the test data...
--============================================================================
-- Create a test table
--============================================================================
--===== If the test exists, drop it so we can rerun if we need to
IF OBJECT_ID('dbo.jbmRunTotTest') IS NOT NULL
DROP TABLE dbo.jbmRunTotTest
--===== Suppress auto-display of rowcounts for appearance and speed
SET NOCOUNT ON
--===== Create and populate the jbmRunTotTest table on the fly
SELECT TOP 200000
IDENTITY(INT,1,5) AS PK,
CAST(RAND(CAST(NEWID() AS VARBINARY))*99+1 AS DECIMAL(7,2)) AS ColA
INTO dbo.jbmRunTotTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.jbmRunTotTest
ADD CONSTRAINT PK_jbmRunTotTest_PK PRIMARY KEY CLUSTERED (PK)
--===== Allow the general public to use it
GRANT ALL ON dbo.jbmRunTotTest TO PUBLIC
So, how to solve this one? A self-joined correlated sub-query would take about a month (took 29 seconds on only 9,999 records and get's exponentially worse). Hmmmm.... maybe we'll have to resort to a RBAR WHILE loop or even a Cursor... nah...
Again, we just want to return one batch of ID's that contain no more than 1,000,000 dollars worth of payments (we'll worry about the multiple batches some other time)... here's a nasty fast way to do it with a temp table and a little help from SQL Server's proprietary UPDATE statement...
--============================================================================
-- Return all PK's (other cols returned for bank proof) that have a running
-- total less than some predetermined number (1 batch)
--============================================================================
--===== If the working temp table already exists, drop it for reruns
IF OBJECT_ID('TempDB..#Temp') IS NOT NULL
DROP TABLE #Temp
--===== Clear any cached data so we can get a true measure of how fast this is
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
--===== Declare local variables
DECLARE @StartTime DATETIME --< Just to measure run duration
SET @StartTime = GETDATE()
DECLARE @MaxVal INT
SET @MaxVal = 1000000 --< This is the predetermined max running total
DECLARE @RunTot DECIMAL(38,2) --< This let's us calc the running total
SET @RunTot = 0
--===== Create and populate the temporary working table on the fly
SELECT IDENTITY(INT,1,1) AS RowNum,
CAST(PK AS INT) AS PK, --< The Primary Key Col from your table
ColA, --< The column that contains the value from your table
CAST(0 AS DECIMAL(38,2)) AS RunTotal --< To be calculated
INTO #Temp --< The new temporary working table
FROM jbmRunTotTest t1 --< Our test table (sub your table name here)
--===== Add a new clustered Primary Key because every table should have one
-- because UPDATE doesn't like an ORDER BY
ALTER TABLE #Temp
ADD PRIMARY KEY CLUSTERED (RowNum)
--===== Update the temporary working table with the running total in RowNum order
UPDATE t1
SET @RunTot = RunTotal = @RunTot+t1.ColA
FROM #Temp t1
--===== Select a batch of rows where the running total is less than the predetermined max
SELECT *
FROM #Temp
WHERE RunTotal <= @MaxVal
--===== All done, display the duration
PRINT CONVERT(VARCHAR(12),DATEADD(ms,DATEDIFF(ms,@StartTime,GETDATE()),0),114)
+ ' Run Duration (hh:mm:ss:mil)'
Is it bad design where I had to do a work around? I don't think so... it's just everyday normal stuff. The use of the temp table allows this process to find the first batch of ID's and the other required info in only 3.36 seconds on a single CPU at 1.8Mhz with 1 IDE hard drive and only 2 gig of memory. Now I don't know about you but I'm thinking that, because of the restrictions, a temp table was the only way to go. And, I'll go a little further and say that I think it's going to be tough for anyone to beat the performance the temp table allowed the code to run with.
But that could be just a temporary thing
--Jeff Moden
Change is inevitable... Change for the better is not.