Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Issue with crypt_gen_random Expand / Collapse
Author
Message
Posted Friday, February 1, 2013 3:58 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 12:55 PM
Points: 156, Visits: 662
This is my scalar function, which returns numbers between @min and @max (both included):
create function GetRandom(@min int, @max int)
returns int
as
begin
declare @diff int = @max - @min + 1
return @min + crypt_gen_random(1) % @diff
end

This is how I test it:
-- This runs for about 50s on my computer.
declare @t table (Number int)
declare @winners table (Number int)
declare @i int = 0
declare @iCount int = 10
declare @j int = 0
declare @jCount int = 100000

while (@i < @iCount)
begin
delete from @t
set @j = 0
while (@j < @jCount)
begin
insert into @t (Number)
select dbo.GetRandom(1, 3)

set @j += 1
end

insert into @winners (Number)
select top 1 Number
from (
select Number, count(*) C
from @t
group by Number
) x
order by C desc

set @i += 1
end

select Number, count(*)
from @winners
group by Number
order by count(*) desc

Output (number 1 wins in ~80% cases):
Number      
----------- -----------
1 9
2 1

Now almost every time I run this query, the Number 1 appears slightly more often... Is there something wrong with my GetRandom function?
Post #1414556
Posted Friday, February 1, 2013 9:04 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:53 PM
Points: 35,366, Visits: 31,905
_simon_ (2/1/2013)
This is my scalar function, which returns numbers between @min and @max (both included):
create function GetRandom(@min int, @max int)
returns int
as
begin
declare @diff int = @max - @min + 1
return @min + crypt_gen_random(1) % @diff
end

This is how I test it:
-- This runs for about 50s on my computer.
declare @t table (Number int)
declare @winners table (Number int)
declare @i int = 0
declare @iCount int = 10
declare @j int = 0
declare @jCount int = 100000

while (@i < @iCount)
begin
delete from @t
set @j = 0
while (@j < @jCount)
begin
insert into @t (Number)
select dbo.GetRandom(1, 3)

set @j += 1
end

insert into @winners (Number)
select top 1 Number
from (
select Number, count(*) C
from @t
group by Number
) x
order by C desc

set @i += 1
end

select Number, count(*)
from @winners
group by Number
order by count(*) desc

Output (number 1 wins in ~80% cases):
Number      
----------- -----------
1 9
2 1

Now almost every time I run this query, the Number 1 appears slightly more often... Is there something wrong with my GetRandom function?


Which edition and version of SQL Server are you actually using. I'm using 2k8 and I get the following error when I try to create your function...

Msg 443, Level 16, State 1, Procedure GetRandom, Line 6
Invalid use of a side-effecting operator 'Crypt_Gen_Random' within a function.


You're implitly converting a VARBINARY(1) to a 4 byte int and at the machine language level, that makes a mess. If you don't think so, change the crypt_gen_random(1) to crypt_gen_random(4) and see the "impossible" happen with the output of your test.

You can fix the problem by changing the crypt_gen_random(1) to ABS(CHECKSUM(CRYPT_GEN_RANDOM(4))).


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1414901
Posted Tuesday, February 5, 2013 5:22 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 12:55 PM
Points: 156, Visits: 662
Jeff, your solution works better, thanks.

I am using Microsoft SQL Server Management Studio 10.50.2500.0, @@version:
Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) 
Jun 17 2011 00:54:03
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)

I have no problem creating the function (maybe I changed something somewhere sometime, but don't know anymore ).
Post #1415761
Posted Tuesday, February 5, 2013 7:48 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:53 PM
Points: 35,366, Visits: 31,905
That would be 2k8 R2, IIRC. I don't have that particular version. I've got 2k5 and 2k8 fired up because that's what I have at work. Hopefully, we'll go straight to 2012 this year. That'll be fun!

I believe I'd rather use NEWID() as a random seed source because it works even with earlier versions of SQL Server. I still work with people that use versions as far back as 6.5 so I have to keep my hand practiced in the black arts of earlier versions as well as more current versions. Of course, someone would have to pry ROW_NUMBER() out of my cold dead hands should anyone try to take that away.

As a bit of a sidebar, if you do the trick of creating a VIEW with SELECT NEWID() AS RandomSeed, you can even use it in an iTVF (inline Table Valued Function).


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1415848
Posted Thursday, February 7, 2013 2:19 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Wednesday, October 15, 2014 12:57 PM
Points: 880, Visits: 2,433
Interesting!

First, Jeff is right - trying to get a 4 byte value out of 1 byte of randomness is... flawed.

Second, I'd expand that - I would really suggest adding an argument and/or logic and/or building a set of modules for each different output type. Perhaps a tinyint is required - that takes only 1 byte of random numbers.

Third, the entire technique you're using has bias built in. If we had a range of 1..4 in our 4 byte randomness, and we want an output range of 1..3, then you'll see the following:
Random Range limited
1 1
2 2
3 3
4 1

You get a bias towards numbers at the low end because of the mod function. Now, going from a 4 byte random to a range of 1..10, you'll be hard pressed to notice the issue (though dieharder might very well find it). Ideally, you'll discard numbers that are in the end range that only partially fills the area of the desired range (in the above example, if you get a 4, just try again).


I'd also recommend generating a lot of values, and then testing them with a suite like dieharder (http://www.phy.duke.edu/~rgb/General/dieharder/dieharder.html), or a government test suite if you need to go to that level. Random data is _hard_.
Post #1417352
Posted Thursday, February 7, 2013 10:31 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 10:53 PM
Points: 3,421, Visits: 5,359
Generating random numbers is way too important to leave to chance!

Hence I refer you to the second link in my signature articles.

In that article, you will find a FUNCTION called RN_MULTINOMIAL(@Binomial, URN), which along with the TYPE declaration for Distribution, you should first run and then try this:

-- Data definition and setup
DECLARE @NumberOfRNs INT

SELECT @NumberOfRNs = 100

--CREATE TYPE Distribution AS TABLE (EventID INT, EventProb FLOAT, CumProb FLOAT)
DECLARE @Binomial AS Distribution

-- Simulate a coin toss with a Binomial Distribution
INSERT INTO @Binomial
SELECT 0, 0.5, 0.5 UNION ALL SELECT 1, 0.5, 1.0

-- Create random numbers for the selected distributions
SELECT TOP (@NumberOfRNs)
RandomBinomial = dbo.RN_MULTINOMIAL(@Binomial, URN)
INTO #MyRandomNumbers
FROM sys.all_columns a1 CROSS APPLY sys.all_columns a2
CROSS APPLY (SELECT RAND(CHECKSUM(NEWID()))) URN(URN)

SELECT RandomBinomial, Number=COUNT(RandomBinomial)
FROM #MyRandomNumbers
GROUP BY RandomBinomial

DROP TABLE #MyRandomNumbers


Unless I've totally missed what you want, you're trying to generate random events (like a coin toss) according to a binomial distribution.

Is that correct?

Edit: Ooops! I did misinterpret slightly. But the same FUNCTION can be used to generate a multinomial distribution. Your events then are all the integers between MIN and MAX. So simply populate these into the rows of the @Binomial distribution table (call it @Multinomial if you'd rather be precise) with equivalent probabilities for each row. Then let'er rip!

To make that a bit more clear, populate the @Binomial table something like this (the 3rd column is the cumulative probability):

--CREATE TYPE Distribution AS TABLE (EventID INT, EventProb FLOAT, CumProb FLOAT) 

DECLARE @Binomial AS Distribution
DECLARE @MinNumber INT = 50, @MaxNumber INT = 500

;WITH Tally AS (
SELECT n=number
FROM [master].dbo.spt_values Tally
WHERE [Type] = 'P' AND Number BETWEEN @MinNumber AND @MaxNumber)
INSERT INTO @Binomial
SELECT n, 1./(1+@MaxNumber-@MinNumber), (1.+n-@MinNumber)/(1.+@MaxNumber-@MinNumber)
FROM Tally

SELECT *
FROM @Binomial


I've also included the CREATE TYPE statement you'll need to run before you CREATE the FUNCTION because a quick review of the article makes it seem kind of hidden.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1417465
Posted Saturday, February 9, 2013 6:25 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, October 17, 2014 8:13 AM
Points: 9,926, Visits: 11,188
_simon_ (2/5/2013)
I am using Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64)

I have no problem creating the function

It's possible that using CRYPT_GEN_RANDOM in a function was (incorrectly) possible in older builds before this bug was fixed.

I notice you are still on R2 SP1 - it might be an idea to apply Service Pack 2 (available since July 2012).




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1418071
Posted Monday, February 18, 2013 4:59 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 12:55 PM
Points: 156, Visits: 662
Thanks everybody for reply!

I'm pretty sure my c# method is ok (code taken from here: http://msdn.microsoft.com/en-us/magazine/cc163367.aspx), so I just deployed it to Sql:
[SqlFunction]
public static int RndBetween(int minValue, int maxValue)
{
if (minValue > maxValue) throw new ArgumentOutOfRangeException("minValue");
if (minValue == maxValue) return minValue;

// Make maxValue inclusive.
maxValue++;

var rng = new RNGCryptoServiceProvider();
var uint32Buffer = new byte[4];
long diff = maxValue - minValue;

while (true)
{
rng.GetBytes(uint32Buffer);
uint rand = BitConverter.ToUInt32(uint32Buffer, 0);
const long max = (1 + (long)int.MaxValue);
long remainder = max % diff;
if (rand < max - remainder)
{
return (int)(minValue + (rand % diff));
}
}
}

Post #1421139
Posted Monday, February 18, 2013 12:43 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:53 PM
Points: 35,366, Visits: 31,905
_simon_ (2/18/2013)
Thanks everybody for reply!

I'm pretty sure my c# method is ok (code taken from here: http://msdn.microsoft.com/en-us/magazine/cc163367.aspx), so I just deployed it to Sql:
[SqlFunction]
public static int RndBetween(int minValue, int maxValue)
{
if (minValue > maxValue) throw new ArgumentOutOfRangeException("minValue");
if (minValue == maxValue) return minValue;

// Make maxValue inclusive.
maxValue++;

var rng = new RNGCryptoServiceProvider();
var uint32Buffer = new byte[4];
long diff = maxValue - minValue;

while (true)
{
rng.GetBytes(uint32Buffer);
uint rand = BitConverter.ToUInt32(uint32Buffer, 0);
const long max = (1 + (long)int.MaxValue);
long remainder = max % diff;
if (rand < max - remainder)
{
return (int)(minValue + (rand % diff));
}
}
}



Seems to me like going to a CLR for a simple, relatively flat random number is overkill. Not sure what the performance of the CLR will be, in this case, either. Please see the following articles...
http://www.sqlservercentral.com/articles/Data+Generation/87901/
http://www.sqlservercentral.com/articles/Test+Data/88964/


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1421328
Posted Monday, February 18, 2013 1:16 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Wednesday, October 15, 2014 12:57 PM
Points: 880, Visits: 2,433
Jeff Moden (2/18/2013)


Seems to me like going to a CLR for a simple, relatively flat random number is overkill. Not sure what the performance of the CLR will be, in this case, either. Please see the following articles...
http://www.sqlservercentral.com/articles/Data+Generation/87901/
http://www.sqlservercentral.com/articles/Test+Data/88964/


I agree that CLR for a simple, relatively flat pseudorandom number is overkill.

However, the original poster is specifically using a function designed to generate crytographically secure random numbers, which is an entirely different ballgame. crypt_gen_random uses the older (Pre-Cryptography Next Generation) API, though I can't tell as easily what API the .NET RNGCryptoServiceProvider uses, though I suspect it's also pre-CNG.

For any cryptographic use (encryption keys, random passwords, etc.), the quality of the random number source is very important to the final quality of the cryptography.
Post #1421337
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse