• Jeff Moden (3/13/2010)


    Nicely done but...

    without exceeding 25 records for all.

    Well that is just embarrassing. :blush:

    Your solution has some nice features, particularly in the calculation of counts.

    It does raise a question, though: in which order should we allocate?

    You have chosen 'cars', then 'trucks', then 'RVs' - but that seems a bit arbitrary.

    My second point concerns performance.

    The code performs three partial scans of the index, and three sorts.

    CPU time on my machine: 188ms - averaged over ten runs.

    Combining the two approaches reduces the average time to 94ms:

    SET STATISTICS TIME ON;

    --===== Declare some variables to hold the counts for each vehicle type.

    DECLARE @CountC INT,

    @CountR INT,

    @CountT INT;

    --===== Randomly assign the counts to be returned with the constraint that

    -- no count must be less than 3 and the total count must be 25.

    SELECT @CountC = ABS(CHECKSUM(NEWID()))%16+3,

    @CountR = ABS(CHECKSUM(NEWID()))%(25-@CountC-6)+3,

    @CountT = 25-@CountC-@CountR;

    WITH SampleSize

    AS (

    --Find the total number of vehicles of each type

    --and the sample size to use for each group

    --(3 rows, 25 rows, or 10%)

    SELECT vehicle_type,

    group_size = COUNT_BIG(*),

    sample_size = @CountC

    FROM #Source

    WHERE vehicle_type = 'C'

    GROUP BY

    vehicle_type

    UNION ALL

    SELECT vehicle_type,

    group_size = COUNT_BIG(*),

    sample_size = @CountT

    FROM #Source

    WHERE vehicle_type = 'T'

    GROUP BY

    vehicle_type

    UNION ALL

    SELECT vehicle_type,

    group_size = COUNT_BIG(*),

    sample_size = @CountR

    FROM #Source

    WHERE vehicle_type = 'R'

    GROUP BY

    vehicle_type

    ),

    RandomRows

    AS (

    -- Generate a random row number for each row required from

    -- each group. Maximum rows returned = 25 * group_count

    --

    -- The TOP (9223372036854775807 (= BIGINT.Max)) is just to

    -- force the query plan to calculate the random row number

    -- before the final join

    SELECT TOP (9223372036854775807)

    SS.vehicle_type,

    Random.rn

    FROM SampleSize SS

    CROSS

    APPLY (

    SELECT TOP (SS.sample_size)

    rn = CONVERT(BIGINT, RAND(CHECKSUM(NEWID(), N.n)) * SS.group_size + 1)

    FROM dbo.Number N

    ORDER BY

    N.n ASC

    ) Random

    ORDER BY

    SS.vehicle_type,

    Random.rn

    ),

    NumberedSourceRows

    AS (

    -- Number the source rows, restarting the numbering for each group

    SELECT S.row_id,

    S.vehicle_type,

    S.vehicle_id,

    rn = ROW_NUMBER() OVER (PARTITION BY S.vehicle_type ORDER BY S.row_id ASC)

    FROM #Source S

    )

    -- Fetch the numbered rows that match the random row number, per group

    SELECT NSR.row_id,

    NSR.vehicle_type,

    NSR.vehicle_id

    FROM RandomRows RR

    JOIN NumberedSourceRows NSR

    ON NSR.vehicle_type = RR.vehicle_type

    AND NSR.rn = RR.rn;

    SET STATISTICS TIME OFF;

    For what it's worth, the minimum-3 maximum-25 from each group problem was much more fun!

    Paul