Here is how to generate a sample set of multinomially distributed random numbers. First, you need to create a TYPE and a FUNCTION by running this script:
CREATE TYPE Distribution AS TABLE (EventID INT, EventProb FLOAT, CumProb FLOAT)
GO
CREATE FUNCTION dbo.RN_MULTINOMIAL
(@Multinomial Distribution READONLY, @URN FLOAT)
RETURNS INT --Cannot use WITH SCHEMABINDING
AS
BEGIN
RETURN
ISNULL(
( SELECT TOP 1 EventID
FROM @Multinomial
WHERE @URN < CumProb
ORDER BY CumProb)
-- Handle unlikely case where URN = exactly 1.0
,( SELECT MAX(EventID)
FROM @Multinomial))
END
Next, you need to set up your multinomial probability distribution table as follows:
DECLARE @MultinomialProbabilities Distribution
;WITH Tally (n) AS (
SELECT TOP 28 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns)
INSERT INTO @MultinomialProbabilities
SELECT n
,CASE n WHEN 1 THEN .6/26. WHEN 2 THEN .3 WHEN 3 THEN .1 ELSE .6/26. END
,CASE n WHEN 1 THEN .6*1./26. WHEN 2 THEN .3+.6*1./26. WHEN 3 THEN .4+.6*1./26. ELSE .4+.6*(n-2)/26. END
FROM Tally
SELECT * FROM @MultinomialProbabilities
Note how the EventProb column shows .3 for event 2 and .1 for event 3. The rest are all the remaining probability (.6) divided by the number of events (26). The last column is the cumulative probability for all previous events (last row should show 1).
The hard part is now behind us.
Now, within the same SQL batch as the above, this test harness tests the generated random numbers so you can compare to the distribution's expected frequency.
DECLARE @TestNums INT = 1000
;WITH Tally (n) AS (
SELECT TOP (@TestNums) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
SELECT MNRN, CountOfMNRNs=COUNT(MNRN), ActualProbability=COUNT(MNRN)/(1.*@TestNums)
FROM (
SELECT MNRN=dbo.RN_MULTINOMIAL(@MultinomialProbabilities, URN)
FROM Tally
CROSS APPLY (SELECT URN=RAND(CHECKSUM(NEWID()))) a
) a
INNER JOIN @MultinomialProbabilities ON EventID=MNRN
GROUP BY MNRN
The key to generating a group of random numbers is the part I highlighted in bold/ This generates a sample set based on the value of @TestNums. The rest of it just groups by EventID and calculates the actual probability. This should center around 0.23 for all events except 2 and 3, which should be close to .3 and .1. The more numbers you generate, the closer they should be to the actual distribution.
Hope this helps.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St