• Jeff Moden (3/14/2010)


    If we precalculate all possible combinations of numbers that have a value of at least "3" and save only the ones that add up to precisely "25" in a table, then a random selection on that table will give a very nice, very even distibution of the 153 different combinations that add up to "25". It's no different than having any other "helper" table (like a Tally or Calendar table) where you don't have to keep recalcalculating the same thing over and over...

    I like this idea a lot. Keeping it all in one query (though it is a shame that NEWID is illegal in a function):

    SET STATISTICS TIME ON;

    WITH SelectionCounts

    AS (

    -- Number of rows to select from each group

    -- Minimum 3, total for all groups = 25

    -- Choose one of the possible combinations at random

    SELECT TOP (1)

    nc = C.n,

    nt = T.n,

    nr = R.n

    FROM dbo.Number C,

    dbo.Number T,

    dbo.Number R

    WHERE C.n BETWEEN 3 AND 19

    AND T.n BETWEEN 3 AND 19

    AND R.n BETWEEN 3 AND 19

    AND C.n + T.n + R.n = 25

    ORDER BY

    NEWID()

    ),

    GroupCounts

    AS (

    --Find the total number of vehicles of each type

    SELECT vehicle_type,

    group_size = COUNT_BIG(*)

    FROM dbo.Source

    WHERE vehicle_type = 'C'

    GROUP BY

    vehicle_type

    UNION ALL

    SELECT vehicle_type,

    group_size = COUNT_BIG(*)

    FROM dbo.Source

    WHERE vehicle_type = 'T'

    GROUP BY

    vehicle_type

    UNION ALL

    SELECT vehicle_type,

    group_size = COUNT_BIG(*)

    FROM dbo.Source

    WHERE vehicle_type = 'R'

    GROUP BY

    vehicle_type

    ),

    Parameters

    AS (

    SELECT GC.group_size,

    GC.vehicle_type,

    sample_size =

    CASE GC.vehicle_type

    WHEN 'C' THEN SC.nc

    WHEN 'T' THEN SC.nt

    WHEN 'R' THEN SC.nr

    ELSE 0

    END

    FROM GroupCounts GC,

    SelectionCounts SC

    ),

    RandomRows

    AS (

    -- Generate a random row number for each row required from

    -- each group. Maximum rows returned = 25 * group_count

    --

    -- The TOP (9223372036854775807 (= BIGINT.Max)) is just to

    -- force the query plan to calculate the random row number

    -- before the final join

    SELECT TOP (9223372036854775807)

    P.vehicle_type,

    Random.rn

    FROM Parameters P

    CROSS

    APPLY (

    SELECT TOP (P.sample_size)

    rn = CONVERT(BIGINT, RAND(CHECKSUM(NEWID(), N.n)) * P.group_size + 1)

    FROM dbo.Number N

    ORDER BY

    N.n ASC

    ) Random

    ORDER BY

    P.vehicle_type,

    Random.rn

    ),

    NumberedSourceRows

    AS (

    -- Number the source rows, restarting the numbering for each group

    SELECT S.row_id,

    S.vehicle_type,

    S.vehicle_id,

    rn = ROW_NUMBER() OVER (PARTITION BY S.vehicle_type ORDER BY S.row_id ASC)

    FROM dbo.Source S

    )

    -- Fetch the numbered rows that match the random row number, per group

    SELECT NSR.row_id,

    NSR.vehicle_type,

    NSR.vehicle_id

    FROM RandomRows RR

    JOIN NumberedSourceRows NSR

    ON NSR.vehicle_type = RR.vehicle_type

    AND NSR.rn = RR.rn;

    SET STATISTICS TIME OFF;

    The above code returns the expected even distribution, and runs in an average of 94ms - exactly the same as the previous method.

    Using a persisted table for one of the 153 possible combinations results in a slightly cleaner plan, but the CPU time remains at 94ms.

    Paul