Is selecting top(sum(int)) possible?

Question

Is selecting top(sum(int)) possible?

shannon-567218

Newbie

Points: 3
More actions
May 7, 2010 at 9:37 pm

#221307

Hello -- thanks for taking the time to read this. Last I tried to broach this issue in 2000, it wasn't possible. But now I'm again wondering if its possible to do:
given:
declare @table table (id int identity(1,1), filename varchar(100), bytes bigint)
insert into @table (filename, bytes) values ('File 1',392297)
...
insert into @table (filename,bytes) values ('File 10273',1882082)
how to SELECT TOP 100000000 bytes from @table ODER BY id
Kinda telling MSSQL to keep adding rows to the selection until the total exceeds 100 meg. Now, I do it by iterating through rows one at a time and manually adding to a running total, but it's WAY less efficient than a built-in function would be (I think)...
Thank in advance for any information or help you can provide!

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

shannon-567218 Newbie Points: 3 More actions · Answer 1

Think I answered this myself in the thread "Return the records that equal the sum" about five pages back. It's about 80% of what I was thinking anyway... If you have something super eloquent to do this, I'd still ove to hear about it =) TIA!

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

Assuming you have a solid ordering method in mind - I'd recommend using your running total concept. Jeff Moden has a VERY high speed way to do this in this article:

http://www.sqlservercentral.com/articles/T-SQL/68467/

It should give you a quick way to pick out the ones you need.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004681 More actions · Answer 3

shannon-567218 (5/7/2010)
Hello -- thanks for taking the time to read this. Last I tried to broach this issue in 2000, it wasn't possible. But now I'm again wondering if its possible to do:
given:
declare @table table (id int identity(1,1), filename varchar(100), bytes bigint)
insert into @table (filename, bytes) values ('File 1',392297)
...
insert into @table (filename,bytes) values ('File 10273',1882082)
how to SELECT TOP 100000000 bytes from @table ODER BY id
Kinda telling MSSQL to keep adding rows to the selection until the total exceeds 100 meg. Now, I do it by iterating through rows one at a time and manually adding to a running total, but it's WAY less efficient than a built-in function would be (I think)...
Thank in advance for any information or help you can provide!

First, you need to read the article that Lutz pointed you to above. Yeah, it's long but if you don't understand what's going on in this code and understand the very simple but very strict rules for usage, you're going to screw up some data some where some day. 😛

So, is this what you had in mind? As usual with my code, the details of what is being done are in the comments...

--=====================================================================================================================

-- This section creates a million row test table and most of it is NOT a part of the solution.

-- However, the creation of the proper clustered index is absolutely essential. Read the comments in the code.

--=====================================================================================================================

--===== Conditionally drop the test table.

-- This is NOT a part of the solution.

IF OBJECT_ID('TempDB..#TestTables','U') IS NOT NULL

DROP TABLE #TestTable

;

--===== Create and populate a test table on the fly.

-- This is NOT a part of the solution.

-- This test table code only works in 2k5+.

-- Simulated file sizes are from 200K to 2M.

-- Not to worry... this takes less than 5 seconds.

SELECT TOP 1000000

IDENTITY(INT,1,1) AS ID,

CAST('File ' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(100)) AS VARCHAR(100)) AS FileName,

CAST(ABS(CHECKSUM(NEWID())) % (2000000-200000+1) + 200000 AS BIGINT) AS Bytes,

CAST(0 AS BIGINT) AS RunningTotal,

CAST(0 AS INT) AS Group100Meg

INTO #TestTable

FROM Master.sys.All_Columns ac1

CROSS JOIN Master.sys.All_Columns ac2

;

--===== Add the quintessential Clustered index as a PK (assuming ID is the PK)

-- Note: Never add "named" constraints to a TempTable because named

-- constraints must be unique throughout a database.

-- THIS IS ABSOLUTELY REQUIRED FOR THIS METHOD TO WORK CORRECTLY!!!!

ALTER TABLE #TestTable

ADD PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

;

--=====================================================================================================================

-- This section IS the solution. It assumes that you don't actually want to go over the 100 Meg mark.

-- And, YES, this method works in all versions of SQL Server to date.

-- On my poor ol' 8 year old desktop, this takes about 5 seconds... to group 1 MILLION rows. ;-)

--=====================================================================================================================

--===== Ok... test data is ready. Declare a couple of essential variables

-- and preset a couple of them.

DECLARE @PrevID INT, --Acts as an "anchor" for the calculations

@RunningTotal BIGINT,

@Group100Meg INT,

@MaxBytes BIGINT

SELECT @RunningTotal = 0,

@Group100Meg = 1,

@MaxBytes = 100000000

--===== Ready to rock. Assign group numbers so as to NOT exceed 100 Meg total.

UPDATE #TestTable

SET @Group100Meg = Group100Meg = CASE

WHEN Bytes + @RunningTotal <= @MaxBytes

THEN @Group100Meg

ELSE @Group100Meg + 1

END,

@RunningTotal = RunningTotal = CASE

WHEN Bytes + @RunningTotal <= @MaxBytes

THEN Bytes + @RunningTotal

ELSE Bytes

END,

@PrevID = ID

FROM #TestTable WITH(TABLOCKX) --One of the rules to success for this code

OPTION (MAXDOP 1) --One of the rules to success for this code

;

--===== Let's see a sample what we've got

SELECT TOP 20000 * FROM #TestTable ORDER BY ID

;

... and sorry about the poor naming convention for "Group100Meg". You should certainly change that but I wanted what the column was for to be painfully obvious for this post.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)