Jeff Moden (3/13/2010)
Nicely done but...without exceeding 25 records for all.
Well that is just embarrassing. :blush:
Your solution has some nice features, particularly in the calculation of counts.
It does raise a question, though: in which order should we allocate?
You have chosen 'cars', then 'trucks', then 'RVs' - but that seems a bit arbitrary.
My second point concerns performance.
The code performs three partial scans of the index, and three sorts.
CPU time on my machine: 188ms - averaged over ten runs.
Combining the two approaches reduces the average time to 94ms:
SET STATISTICS TIME ON;
--===== Declare some variables to hold the counts for each vehicle type.
DECLARE @CountC INT,
@CountR INT,
@CountT INT;
--===== Randomly assign the counts to be returned with the constraint that
-- no count must be less than 3 and the total count must be 25.
SELECT @CountC = ABS(CHECKSUM(NEWID()))%16+3,
@CountR = ABS(CHECKSUM(NEWID()))%(25-@CountC-6)+3,
@CountT = 25-@CountC-@CountR;
WITH SampleSize
AS (
--Find the total number of vehicles of each type
--and the sample size to use for each group
--(3 rows, 25 rows, or 10%)
SELECT vehicle_type,
group_size = COUNT_BIG(*),
sample_size = @CountC
FROM #Source
WHERE vehicle_type = 'C'
GROUP BY
vehicle_type
UNION ALL
SELECT vehicle_type,
group_size = COUNT_BIG(*),
sample_size = @CountT
FROM #Source
WHERE vehicle_type = 'T'
GROUP BY
vehicle_type
UNION ALL
SELECT vehicle_type,
group_size = COUNT_BIG(*),
sample_size = @CountR
FROM #Source
WHERE vehicle_type = 'R'
GROUP BY
vehicle_type
),
RandomRows
AS (
-- Generate a random row number for each row required from
-- each group. Maximum rows returned = 25 * group_count
--
-- The TOP (9223372036854775807 (= BIGINT.Max)) is just to
-- force the query plan to calculate the random row number
-- before the final join
SELECT TOP (9223372036854775807)
SS.vehicle_type,
Random.rn
FROM SampleSize SS
CROSS
APPLY (
SELECT TOP (SS.sample_size)
rn = CONVERT(BIGINT, RAND(CHECKSUM(NEWID(), N.n)) * SS.group_size + 1)
FROM dbo.Number N
ORDER BY
N.n ASC
) Random
ORDER BY
SS.vehicle_type,
Random.rn
),
NumberedSourceRows
AS (
-- Number the source rows, restarting the numbering for each group
SELECT S.row_id,
S.vehicle_type,
S.vehicle_id,
rn = ROW_NUMBER() OVER (PARTITION BY S.vehicle_type ORDER BY S.row_id ASC)
FROM #Source S
)
-- Fetch the numbered rows that match the random row number, per group
SELECT NSR.row_id,
NSR.vehicle_type,
NSR.vehicle_id
FROM RandomRows RR
JOIN NumberedSourceRows NSR
ON NSR.vehicle_type = RR.vehicle_type
AND NSR.rn = RR.rn;
SET STATISTICS TIME OFF;
For what it's worth, the minimum-3 maximum-25 from each group problem was much more fun!
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi