Is selecting top(sum(int)) possible?

  • Hello -- thanks for taking the time to read this. Last I tried to broach this issue in 2000, it wasn't possible. But now I'm again wondering if its possible to do:

    given:

    declare @table table (id int identity(1,1), filename varchar(100), bytes bigint)

    insert into @table (filename, bytes) values ('File 1',392297)

    ...

    insert into @table (filename,bytes) values ('File 10273',1882082)

    how to SELECT TOP 100000000 bytes from @table ODER BY id

    Kinda telling MSSQL to keep adding rows to the selection until the total exceeds 100 meg. Now, I do it by iterating through rows one at a time and manually adding to a running total, but it's WAY less efficient than a built-in function would be (I think)...

    Thank in advance for any information or help you can provide!

  • Think I answered this myself in the thread "Return the records that equal the sum" about five pages back. It's about 80% of what I was thinking anyway... If you have something super eloquent to do this, I'd still ove to hear about it =) TIA!

  • Assuming you have a solid ordering method in mind - I'd recommend using your running total concept. Jeff Moden has a VERY high speed way to do this in this article:

    http://www.sqlservercentral.com/articles/T-SQL/68467/

    It should give you a quick way to pick out the ones you need.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • shannon-567218 (5/7/2010)


    Hello -- thanks for taking the time to read this. Last I tried to broach this issue in 2000, it wasn't possible. But now I'm again wondering if its possible to do:

    given:

    declare @table table (id int identity(1,1), filename varchar(100), bytes bigint)

    insert into @table (filename, bytes) values ('File 1',392297)

    ...

    insert into @table (filename,bytes) values ('File 10273',1882082)

    how to SELECT TOP 100000000 bytes from @table ODER BY id

    Kinda telling MSSQL to keep adding rows to the selection until the total exceeds 100 meg. Now, I do it by iterating through rows one at a time and manually adding to a running total, but it's WAY less efficient than a built-in function would be (I think)...

    Thank in advance for any information or help you can provide!

    First, you need to read the article that Lutz pointed you to above. Yeah, it's long but if you don't understand what's going on in this code and understand the very simple but very strict rules for usage, you're going to screw up some data some where some day. 😛

    So, is this what you had in mind? As usual with my code, the details of what is being done are in the comments...

    --=====================================================================================================================

    -- This section creates a million row test table and most of it is NOT a part of the solution.

    -- However, the creation of the proper clustered index is absolutely essential. Read the comments in the code.

    --=====================================================================================================================

    --===== Conditionally drop the test table.

    -- This is NOT a part of the solution.

    IF OBJECT_ID('TempDB..#TestTables','U') IS NOT NULL

    DROP TABLE #TestTable

    ;

    --===== Create and populate a test table on the fly.

    -- This is NOT a part of the solution.

    -- This test table code only works in 2k5+.

    -- Simulated file sizes are from 200K to 2M.

    -- Not to worry... this takes less than 5 seconds.

    SELECT TOP 1000000

    IDENTITY(INT,1,1) AS ID,

    CAST('File ' + CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(100)) AS VARCHAR(100)) AS FileName,

    CAST(ABS(CHECKSUM(NEWID())) % (2000000-200000+1) + 200000 AS BIGINT) AS Bytes,

    CAST(0 AS BIGINT) AS RunningTotal,

    CAST(0 AS INT) AS Group100Meg

    INTO #TestTable

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    ;

    --===== Add the quintessential Clustered index as a PK (assuming ID is the PK)

    -- Note: Never add "named" constraints to a TempTable because named

    -- constraints must be unique throughout a database.

    -- THIS IS ABSOLUTELY REQUIRED FOR THIS METHOD TO WORK CORRECTLY!!!!

    ALTER TABLE #TestTable

    ADD PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

    ;

    --=====================================================================================================================

    -- This section IS the solution. It assumes that you don't actually want to go over the 100 Meg mark.

    -- And, YES, this method works in all versions of SQL Server to date.

    -- On my poor ol' 8 year old desktop, this takes about 5 seconds... to group 1 MILLION rows. ;-)

    --=====================================================================================================================

    --===== Ok... test data is ready. Declare a couple of essential variables

    -- and preset a couple of them.

    DECLARE @PrevID INT, --Acts as an "anchor" for the calculations

    @RunningTotal BIGINT,

    @Group100Meg INT,

    @MaxBytes BIGINT

    SELECT @RunningTotal = 0,

    @Group100Meg = 1,

    @MaxBytes = 100000000

    --===== Ready to rock. Assign group numbers so as to NOT exceed 100 Meg total.

    UPDATE #TestTable

    SET @Group100Meg = Group100Meg = CASE

    WHEN Bytes + @RunningTotal <= @MaxBytes

    THEN @Group100Meg

    ELSE @Group100Meg + 1

    END,

    @RunningTotal = RunningTotal = CASE

    WHEN Bytes + @RunningTotal <= @MaxBytes

    THEN Bytes + @RunningTotal

    ELSE Bytes

    END,

    @PrevID = ID

    FROM #TestTable WITH(TABLOCKX) --One of the rules to success for this code

    OPTION (MAXDOP 1) --One of the rules to success for this code

    ;

    --===== Let's see a sample what we've got

    SELECT TOP 20000 * FROM #TestTable ORDER BY ID

    ;

    ... and sorry about the poor naming convention for "Group100Meg". You should certainly change that but I wanted what the column was for to be painfully obvious for this post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply