Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»»

Random problems Expand / Collapse
Author
Message
Posted Tuesday, June 09, 2009 12:07 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 10:21 PM
Points: 35,951, Visits: 30,235
Mike C (6/9/2009)
Someguy (6/9/2009)
OK. I'll be the one to ask the dumb question:

Most languages have a random function that does not require you to add seeds and/or do other things to make them work. In C# for example, you can choose to add a seed, but the default doesn't require one.

So, why does T-SQL default to requiring a seed, seeing as doing so creates 'scratch head' situations like this? It sure seems like we're doing some big back flips here to achieve something that's pretty common in programming tasks. We're using functions like 'NewID' to give us the random number that Random can't do without.

And technically, the guid generated by NewID is not totally random. For practical matters like this excercise, sure. But Random and Unique aren't necesarily the same things. But I digress...



In T-SQL you can use RAND() without a seed. If you want better random numbers check out one of my favorite random number generators with very nice statistical properties: Mersenne Twister at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html. Most of the time in SQL though, the NEWID() function is "good enough" since we are usually just talking about quick and practical methods of sampling data in a non-orderly fashion.

BTW, the "backflips" are a result of two factors: (1) the definition of the RAND() function, which generates one value per invocation, and (2) the fact that SQL is a declarative language. NEWID() guarantees a different value for every row of a result set, RAND() does not provide that; hence the issue the author is trying to get around.



RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #731764
Posted Tuesday, June 09, 2009 12:07 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, August 23, 2010 9:31 AM
Points: 6, Visits: 57
I don't know the properties of newid, and any function on an unknown source of entropy can cause problems if you'll looking for true randomnous rather than just a simple way subset records.

Consider using (convert(int ,CRYPT_GEN_RANDOM(4))/21474836467) as the rng.

CRYPT_GEN_RANDOM(Number of bytes) yields a hex number using the CAPI cryptologically secure random number generator--not true random, but having excellent properties in distribution and in not being able to reverse the algorithm (unlike rand's linear congruent generator)
Post #731765
Posted Tuesday, June 09, 2009 12:18 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 10:21 PM
Points: 35,951, Visits: 30,235
Heh... in that case, no. NEWID() is random enough as a seed.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #731773
Posted Tuesday, June 09, 2009 1:20 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Saturday, September 15, 2012 1:55 PM
Points: 31, Visits: 34
Unlike your other readers, I tried to find the ugliest solution.

Here it is:

select
InvoiceID,
RandNumber = cast(rand(cast(right(replace(convert(varchar(24), dateadd(minute, Row_Number() OVER (Order by InvoiceID) , getdate()), 113), ':', ''), 9) as int)) * 100 + 1 as int)
from
[Invoice]

The idea is: use time as the seed, but add "rownumber" of minutes to the current time for each successive row. The string manipulation functions are just a way to extract the hours + minutes + milliseconds portion of the time.
Post #731810
Posted Tuesday, June 09, 2009 1:32 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, August 28, 2013 1:49 PM
Points: 35, Visits: 315
I cheated.

select cast( newid() as binary(4) )

Post #731820
Posted Tuesday, June 09, 2009 2:16 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, January 29, 2014 7:19 AM
Points: 153, Visits: 569
Mike C.

Thank you for your answer to a slightly lazy question - Yes, I should have simply run the query SELECT Rand() and pushed the start button several times and I would have seen that for single runs, Rand defaults to 'no seed required.'

Although in terms of this article, it seems like the generation of multiple rows, each with its own random number is not an outrageous requirement in the programming world. This is especially true considering the fact that SQL is meant to call sets as well as individual results. And yes, I understand that most systems don't produce truly random numbers but hard-coded sequences, so 'random' keywords in programming do are not really random, but practically useful for most instances.

Maybe a better way to word the question would be, if Microsoft can create a keyword like NewID that produces a new result for each new row, why can't they either do the same for Rand() or gives us another keyword (considering that at this point there are mountains of code with back-flips that will get messed up if Rand() were changed,or assuming that Rand() may fit some current standard for SQL)? That way, we wouldn't have to use the currently existing random generator functionality from NewID to make Rand() work the way you would otherwise expect it to work.

Incidentally, another interesting site is Random.org, which generates random numbers based on atmospheric noise.


___________________________________________________
“Politicians are like diapers. They both need changing regularly and for the same reason.”
Post #731852
Posted Tuesday, June 09, 2009 4:11 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Jeff Moden (6/9/2009)
Mike C (6/9/2009)

In T-SQL you can use RAND() without a seed. If you want better random numbers check out one of my favorite random number generators with very nice statistical properties: Mersenne Twister at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html. Most of the time in SQL though, the NEWID() function is "good enough" since we are usually just talking about quick and practical methods of sampling data in a non-orderly fashion.

BTW, the "backflips" are a result of two factors: (1) the definition of the RAND() function, which generates one value per invocation, and (2) the fact that SQL is a declarative language. NEWID() guarantees a different value for every row of a result set, RAND() does not provide that; hence the issue the author is trying to get around.



RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.


One row at a time, Jeff?
Post #731928
Posted Tuesday, June 09, 2009 4:13 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
sqlservercentral (6/9/2009)
I don't know the properties of newid, and any function on an unknown source of entropy can cause problems if you'll looking for true randomnous rather than just a simple way subset records.

Consider using (convert(int ,CRYPT_GEN_RANDOM(4))/21474836467) as the rng.

CRYPT_GEN_RANDOM(Number of bytes) yields a hex number using the CAPI cryptologically secure random number generator--not true random, but having excellent properties in distribution and in not being able to reverse the algorithm (unlike rand's linear congruent generator)


Cryptographically secure random number generators are notoriously slow. CRYPT_GEN_RANDOM is available only on SQL 2008.
Post #731929
Posted Tuesday, June 09, 2009 4:19 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Someguy (6/9/2009)
Mike C.

Thank you for your answer to a slightly lazy question - Yes, I should have simply run the query SELECT Rand() and pushed the start button several times and I would have seen that for single runs, Rand defaults to 'no seed required.'

Although in terms of this article, it seems like the generation of multiple rows, each with its own random number is not an outrageous requirement in the programming world. This is especially true considering the fact that SQL is meant to call sets as well as individual results. And yes, I understand that most systems don't produce truly random numbers but hard-coded sequences, so 'random' keywords in programming do are not really random, but practically useful for most instances.

Maybe a better way to word the question would be, if Microsoft can create a keyword like NewID that produces a new result for each new row, why can't they either do the same for Rand() or gives us another keyword (considering that at this point there are mountains of code with back-flips that will get messed up if Rand() were changed,or assuming that Rand() may fit some current standard for SQL)? That way, we wouldn't have to use the currently existing random generator functionality from NewID to make Rand() work the way you would otherwise expect it to work.

Incidentally, another interesting site is Random.org, which generates random numbers based on atmospheric noise.


RAND() probably won't be changed in it's behavior for the exact reason you mention -- backwards-compatibility. A new keyword/function could be a possibility, but they probably won't anytime soon. Unless you have a compelling reason--such as with the introduction of CRYPT_GEN_RANDOM for cryptographic random numbers--there's not much call for randomness in the database. Most uses I've seen for random number generation in SQL boil down to two scenarios: (1) retrieving random rows from a result set, and (2) generating random data to store in the database for testing. Both can be done from the client side or the server-side, with the current functionality, and both tend to be one-off development, testing and troubleshooting requirements as opposed to daily production requirements.

I believe it's Schneier's book where he recommends using subatomic particle decay to generate random numbers :)
Post #731931
Posted Tuesday, June 09, 2009 5:34 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, August 01, 2013 4:24 PM
Points: 2, Visits: 25
Jeff Moden (6/9/2009)


RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.


No, it doesn't work at all like most languages. Most people expect RAND() in the result set to generate multiple values when the result set contains multiple rows. It doesn't. In my experience, this functionality is pretty unique.

A minor quibble with the article: the problem is not that RAND() is called multiple times with the same seed, once per row. The problem is that RAND() is only called once per query (just like GETDATE()). Adding the NEWID() call is a hack that forces the query optimizer to call RAND() once per row.

Great article!

David

Post #731951
« Prev Topic | Next Topic »

Add to briefcase «««23456»»»

Permissions Expand / Collapse