Dividing a count of rows from a table into equal batch sizes using NTILE

  • Good afternoon all.

    I am trying to figure a way of getting the total row count of a table and dividing that into equal sizes, but varying number, of batches for an insert statement to process.

    i.e. if a table has 100,000 rows, and I specify 5 batches I currently get the following from my CTE (below):
          RowStart       RowEnd
    1    1                   20000
    2    20001           40000
    3    40001           60000
    4    60001           80000
    5    80001           100000

    so I can insert into a new table like this:
    INSERT INTO <SOMETABLE> WHERE <PRIMARY KEY> BETWEEN @ROWSTART AND @ROWEND
    .. then loop to next row

    This means I won't fill-up the LOG file by trying to insert all in 1 go.  The actual row counts of the tables I need to do this for number in the 10's of millions.
    I was hoping to divide whatever the row count of the table was into <row_count>/100000 # of batches of 100,000 - so for a 1M row table, we have 10 batches and so on.

    I have purloined some code from the internet, apologies as I cannot remember where and so cannot attribute it, but it is as follows:


    ;
      WITH Pass0
      AS (SELECT
      1 AS c
      UNION ALL
      SELECT
      1),
      Pass1
      AS (SELECT
      1 AS c
      FROM Pass0 AS a,
       Pass0 AS b),
      Pass2
      AS (SELECT
      1 AS c
      FROM Pass1 AS a,
       Pass1 AS b),
      Pass3
      AS (SELECT
      1 AS c
      FROM Pass2 AS a,
       Pass2 AS b),
      Pass4
      AS (SELECT
      1 AS c
      FROM Pass3 AS a,
       Pass3 AS b),
      Pass5
      AS (SELECT
      1 AS c
      FROM Pass4 AS a,
       Pass4 AS b),
      vals (val)
      AS (SELECT
      ROW_NUMBER() OVER (ORDER BY c) AS number
      FROM Pass5)
      SELECT
      [TheTile] AS [TheRange],
      MIN(val) AS RangeStart,
      MAX(val) AS [RangeEnd]
      FROM (SELECT
      NTILE(5) OVER (ORDER BY val) AS [TheTile],
      val
      FROM vals
      WHERE val BETWEEN 1 AND 100000) a
      GROUP BY [TheTile]
      ORDER BY [TheTile]; 

    where the NTILE(#) is the number of batches, and the "WHERE val BETWEEN <START> AND <END>" is 1 and the total row count of the table. 

    My problem is, that this works well until I have a row count > 1M, I have a table with 92M rows where I want to have <start> and <end> values for the insert statement, but it takes forever to run and I need 1,000,000/100,000 = 10 batches for that one, for 92M I need 344 batches and the whole thing grinds to a halt.

    Does anyone have an idea of a way to achieve what I am trying to do please, the code I posted does not scale very well at all - there must be a better way?

    Thank you all for your input.

    Kindest
    Duncan.

  • EDIT:
    I thought it was quicker, but it's still really slow on 92M rows.. I would welcome any ideas please?

    thank you

    it's ok, I've sorted it using the tried and tested GetNums() function

      SELECT
      [TheTile] AS [TheRange],
      MIN(a.n) AS RangeStart,
      MAX(a.n) AS [RangeEnd]
      FROM (SELECT
      NTILE(344) OVER (ORDER BY gn.n) AS [TheTile],
      gn.n
      FROM dbo.GetNums(1, 10000000) gn
      WHERE gn.n BETWEEN 1 AND 10000000) a
      GROUP BY [TheTile]
      ORDER BY [TheTile];

    thanks all!

  • You're thinking in the wrong direction. Instead of generating the rows and grouping them, you need to generate the groups and calculate which rows would you need to include. So, instead of creating 10,000 rows, you just create 5.


    DECLARE @Rows int = 10000000,
       @Batches int = 5;

    WITH
    E(n) AS(
      SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
    ),
    E2(n) AS(
      SELECT a.n FROM E a, E b
    ),
    E4(n) AS(
      SELECT a.n FROM E2 a, E2 b
    ),
    cteTally(n) AS(
      SELECT TOP (@Batches) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
      FROM E4
    )
    SELECT n AS [TheRange],
        1 + CEILING(@Rows / @Batches) * (n -1) AS [RangeStart],
        CEILING(@Rows / @Batches) * n AS [RangeEnd]
    FROM cteTally

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thank you Luis, I was indeed looking at the issue from the wrong direction.  Your code is excellent and very fast, I will learn what it is doing and take this as a learning example for the future.  Much obliged!  D.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply