Limiting processing to 1m rows at a time

  • I have a table (SS2000) containing 3m rows which takes hours to process by a stored proc. Yet if I limit the rows to 1m - the sp only takes 20 minutes to process. I'm therefore thinking of doing all the processing in a loop of 1m rows at a time. The unique key is an INT called ID however, there can be numerical gaps between consecutive IDs. Does anyone have a neat way of populating a temporary table like this:

    Identity IDFrom IDTo

    1 3 1,000,987

    2 1,000,988 2,122,338

    3 2,122,340 3,277,647

    which my loop could then process down enabling me to use a statement like

    AND ID BETWEEN IDFROM and IDTO to get my 1m row chunks?

    Thanks!

  • Are you looking to make sure you run this process on EXACTLY 1M rows, or just looking to limit the recordset? I mean - creating a new ID isn't hard, but if you processing relies on things like indexes, etc. on the main table, you're going to have to rebuild all of that on the temp table.

    If you're just looking to throttle back the process, then consider just walking your permanent ID:

    declare @chunksize int

    declare @ptr int

    declare @maxptr int

    select @chunksize=1000000,@ptr=0;

    select @maxptr=max(id) from mytable --find the highest ID in the table I am trying to walk

    WHILE (@ptr<@maxptr)

    BEGIN

    --do your processing

    SELECT * from myTable

    where mytable.id between @ptr and @ptr+@chunksize;

    --increment and move on

    select @ptr=@ptr+@chunksize+1

    END

    If it IS important to have exactly 1M, then populating the temp table is easy too:

    select Identity(int,1,1) as newTempID, *

    INTO #MyTempTable

    from myTable

    If you already have an identity, you may need to just simply recast it to int to "hide" the identity property (since SELECT INTO will likely whine about 2 different Identity columns in a single table).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • You can create a temp table, with an identity column and have it auto generate all the way down.

    CREATE TABLE #tmp(

    Row_ID BIGINT IDENTITY(1,1),

    ID INT,

    Col2 INT,

    Col3 INT

    )

    INSERT INTO #tmp

    SELECT ID,Col2,Col3

    FROM MyTable

    SELECT *

    FROM #tmp

    WHERE Row_ID BETWEEN 1 AND 1000000

    What type of processing are you doing in the stored procedure?

  • create table #Temp (

    ID int identity primary key,

    FromID int,

    ToID int)

    insert into #temp (fromid, toid)

    select *

    from

    (select 1, 1000000 union all

    select 1000001, 2000000 union all

    select 2000001, 3000000 union all) -- however many you need

    If you have a Numbers table, replace the select in the last bit with:

    select 1000000 * (number -1) + 1, 1000000 * number

    from dbo.Numbers

    where number > 0

    and number <=

    (select max(id)/1000000 +

    case

    when max(id)%1000000 = 0 then 0

    else 1

    end

    from dbo.MainTable)

    That may not give you exactly 1-million rows per run, but unless the EXACT number matters (I can't see how it could), it should work just fine.

    I'd imagine that the reason going over 1-million rows slows way down is because of drive swapping, instead of running in RAM. Maybe not, but that's my first guess.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • If the process has the correct criteria to keep rows from being updated more than once, then you could SET ROWCOUNT 1000000 and loop until @@ROWCOUNT returns zero...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply