May 7, 2010 at 9:37 pm
Hello -- thanks for taking the time to read this. Last I tried to broach this issue in 2000, it wasn't possible. But now I'm again wondering if its possible to do:
given:
declare @table table (id int identity(1,1), filename varchar(100), bytes bigint)
insert into @table (filename, bytes) values ('File 1',392297)
...
insert into @table (filename,bytes) values ('File 10273',1882082)
how to SELECT TOP 100000000 bytes from @table ODER BY id
Kinda telling MSSQL to keep adding rows to the selection until the total exceeds 100 meg. Now, I do it by iterating through rows one at a time and manually adding to a running total, but it's WAY less efficient than a built-in function would be (I think)...
Thank in advance for any information or help you can provide!
May 7, 2010 at 9:47 pm
Think I answered this myself in the thread "Return the records that equal the sum" about five pages back. It's about 80% of what I was thinking anyway... If you have something super eloquent to do this, I'd still ove to hear about it =) TIA!
May 8, 2010 at 1:00 pm
Assuming you have a solid ordering method in mind - I'd recommend using your running total concept. Jeff Moden has a VERY high speed way to do this in this article:
http://www.sqlservercentral.com/articles/T-SQL/68467/
It should give you a quick way to pick out the ones you need.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 8, 2010 at 4:46 pm
shannon-567218 (5/7/2010)
Hello -- thanks for taking the time to read this. Last I tried to broach this issue in 2000, it wasn't possible. But now I'm again wondering if its possible to do:given:
declare @table table (id int identity(1,1), filename varchar(100), bytes bigint)
insert into @table (filename, bytes) values ('File 1',392297)
...
insert into @table (filename,bytes) values ('File 10273',1882082)
how to SELECT TOP 100000000 bytes from @table ODER BY id
Kinda telling MSSQL to keep adding rows to the selection until the total exceeds 100 meg. Now, I do it by iterating through rows one at a time and manually adding to a running total, but it's WAY less efficient than a built-in function would be (I think)...
Thank in advance for any information or help you can provide!
First, you need to read the article that Lutz pointed you to above. Yeah, it's long but if you don't understand what's going on in this code and understand the very simple but very strict rules for usage, you're going to screw up some data some where some day. 😛
So, is this what you had in mind? As usual with my code, the details of what is being done are in the comments...
--=====================================================================================================================
-- This section creates a million row test table and most of it is NOT a part of the solution.
-- However, the creation of the proper clustered index is absolutely essential. Read the comments in the code.
--=====================================================================================================================
--===== Conditionally drop the test table.
-- This is NOT a part of the solution.
IF OBJECT_ID('TempDB..#TestTables','U') IS NOT NULL
DROP TABLE #TestTable
;
--===== Create and populate a test table on the fly.
-- This is NOT a part of the solution.
-- This test table code only works in 2k5+.
-- Simulated file sizes are from 200K to 2M.
-- Not to worry... this takes less than 5 seconds.
SELECT TOP 1000000
IDENTITY(INT,1,1) AS ID,
CAST('File ' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(100)) AS VARCHAR(100)) AS FileName,
CAST(ABS(CHECKSUM(NEWID())) % (2000000-200000+1) + 200000 AS BIGINT) AS Bytes,
CAST(0 AS BIGINT) AS RunningTotal,
CAST(0 AS INT) AS Group100Meg
INTO #TestTable
FROM Master.sys.All_Columns ac1
CROSS JOIN Master.sys.All_Columns ac2
;
--===== Add the quintessential Clustered index as a PK (assuming ID is the PK)
-- Note: Never add "named" constraints to a TempTable because named
-- constraints must be unique throughout a database.
-- THIS IS ABSOLUTELY REQUIRED FOR THIS METHOD TO WORK CORRECTLY!!!!
ALTER TABLE #TestTable
ADD PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100
;
--=====================================================================================================================
-- This section IS the solution. It assumes that you don't actually want to go over the 100 Meg mark.
-- And, YES, this method works in all versions of SQL Server to date.
-- On my poor ol' 8 year old desktop, this takes about 5 seconds... to group 1 MILLION rows. ;-)
--=====================================================================================================================
--===== Ok... test data is ready. Declare a couple of essential variables
-- and preset a couple of them.
DECLARE @PrevID INT, --Acts as an "anchor" for the calculations
@RunningTotal BIGINT,
@Group100Meg INT,
@MaxBytes BIGINT
SELECT @RunningTotal = 0,
@Group100Meg = 1,
@MaxBytes = 100000000
--===== Ready to rock. Assign group numbers so as to NOT exceed 100 Meg total.
UPDATE #TestTable
SET @Group100Meg = Group100Meg = CASE
WHEN Bytes + @RunningTotal <= @MaxBytes
THEN @Group100Meg
ELSE @Group100Meg + 1
END,
@RunningTotal = RunningTotal = CASE
WHEN Bytes + @RunningTotal <= @MaxBytes
THEN Bytes + @RunningTotal
ELSE Bytes
END,
@PrevID = ID
FROM #TestTable WITH(TABLOCKX) --One of the rules to success for this code
OPTION (MAXDOP 1) --One of the rules to success for this code
;
--===== Let's see a sample what we've got
SELECT TOP 20000 * FROM #TestTable ORDER BY ID
;
... and sorry about the poor naming convention for "Group100Meg". You should certainly change that but I wanted what the column was for to be painfully obvious for this post.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply