Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Random problems


Random problems

Author
Message
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44994 Visits: 39880
Mike C (6/9/2009)
Someguy (6/9/2009)
OK. I'll be the one to ask the dumb question:

Most languages have a random function that does not require you to add seeds and/or do other things to make them work. In C# for example, you can choose to add a seed, but the default doesn't require one.

So, why does T-SQL default to requiring a seed, seeing as doing so creates 'scratch head' situations like this? It sure seems like we're doing some big back flips here to achieve something that's pretty common in programming tasks. We're using functions like 'NewID' to give us the random number that Random can't do without.

And technically, the guid generated by NewID is not totally random. For practical matters like this excercise, sure. But Random and Unique aren't necesarily the same things. But I digress...



In T-SQL you can use RAND() without a seed. If you want better random numbers check out one of my favorite random number generators with very nice statistical properties: Mersenne Twister at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html. Most of the time in SQL though, the NEWID() function is "good enough" since we are usually just talking about quick and practical methods of sampling data in a non-orderly fashion.

BTW, the "backflips" are a result of two factors: (1) the definition of the RAND() function, which generates one value per invocation, and (2) the fact that SQL is a declarative language. NEWID() guarantees a different value for every row of a result set, RAND() does not provide that; hence the issue the author is trying to get around.



RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
DannyS
DannyS
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 57
I don't know the properties of newid, and any function on an unknown source of entropy can cause problems if you'll looking for true randomnous rather than just a simple way subset records.

Consider using (convert(int ,CRYPT_GEN_RANDOM(4))/21474836467) as the rng.

CRYPT_GEN_RANDOM(Number of bytes) yields a hex number using the CAPI cryptologically secure random number generator--not true random, but having excellent properties in distribution and in not being able to reverse the algorithm (unlike rand's linear congruent generator)
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44994 Visits: 39880
Heh... in that case, no. NEWID() is random enough as a seed.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
voutmaster
voutmaster
SSC Rookie
SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)

Group: General Forum Members
Points: 31 Visits: 34
Unlike your other readers, I tried to find the ugliest solution.

Here it is:

select
   InvoiceID,   
   RandNumber = cast(rand(cast(right(replace(convert(varchar(24), dateadd(minute, Row_Number() OVER (Order by InvoiceID) , getdate()), 113), ':', ''), 9) as int)) * 100 + 1 as int)
from
   [Invoice]

The idea is: use time as the seed, but add "rownumber" of minutes to the current time for each successive row. The string manipulation functions are just a way to extract the hours + minutes + milliseconds portion of the time.
Sean Terry
Sean Terry
SSC Rookie
SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)

Group: General Forum Members
Points: 37 Visits: 347
I cheated.


select cast( newid() as binary(4) )



:-P
Someguy
Someguy
SSC-Enthusiastic
SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)SSC-Enthusiastic (157 reputation)

Group: General Forum Members
Points: 157 Visits: 579
Mike C.

Thank you for your answer to a slightly lazy question - Yes, I should have simply run the query SELECT Rand() and pushed the start button several times and I would have seen that for single runs, Rand defaults to 'no seed required.'

Although in terms of this article, it seems like the generation of multiple rows, each with its own random number is not an outrageous requirement in the programming world. This is especially true considering the fact that SQL is meant to call sets as well as individual results. And yes, I understand that most systems don't produce truly random numbers but hard-coded sequences, so 'random' keywords in programming do are not really random, but practically useful for most instances.

Maybe a better way to word the question would be, if Microsoft can create a keyword like NewID that produces a new result for each new row, why can't they either do the same for Rand() or gives us another keyword (considering that at this point there are mountains of code with back-flips that will get messed up if Rand() were changed,or assuming that Rand() may fit some current standard for SQL)? That way, we wouldn't have to use the currently existing random generator functionality from NewID to make Rand() work the way you would otherwise expect it to work.

Incidentally, another interesting site is Random.org, which generates random numbers based on atmospheric noise.

___________________________________________________
“Politicians are like diapers. They both need changing regularly and for the same reason.”
Mike C
Mike C
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1349 Visits: 1168
Jeff Moden (6/9/2009)
Mike C (6/9/2009)

In T-SQL you can use RAND() without a seed. If you want better random numbers check out one of my favorite random number generators with very nice statistical properties: Mersenne Twister at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html. Most of the time in SQL though, the NEWID() function is "good enough" since we are usually just talking about quick and practical methods of sampling data in a non-orderly fashion.

BTW, the "backflips" are a result of two factors: (1) the definition of the RAND() function, which generates one value per invocation, and (2) the fact that SQL is a declarative language. NEWID() guarantees a different value for every row of a result set, RAND() does not provide that; hence the issue the author is trying to get around.



RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.


One row at a time, Jeff?
Mike C
Mike C
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1349 Visits: 1168
sqlservercentral (6/9/2009)
I don't know the properties of newid, and any function on an unknown source of entropy can cause problems if you'll looking for true randomnous rather than just a simple way subset records.

Consider using (convert(int ,CRYPT_GEN_RANDOM(4))/21474836467) as the rng.

CRYPT_GEN_RANDOM(Number of bytes) yields a hex number using the CAPI cryptologically secure random number generator--not true random, but having excellent properties in distribution and in not being able to reverse the algorithm (unlike rand's linear congruent generator)


Cryptographically secure random number generators are notoriously slow. CRYPT_GEN_RANDOM is available only on SQL 2008.
Mike C
Mike C
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1349 Visits: 1168
Someguy (6/9/2009)
Mike C.

Thank you for your answer to a slightly lazy question - Yes, I should have simply run the query SELECT Rand() and pushed the start button several times and I would have seen that for single runs, Rand defaults to 'no seed required.'

Although in terms of this article, it seems like the generation of multiple rows, each with its own random number is not an outrageous requirement in the programming world. This is especially true considering the fact that SQL is meant to call sets as well as individual results. And yes, I understand that most systems don't produce truly random numbers but hard-coded sequences, so 'random' keywords in programming do are not really random, but practically useful for most instances.

Maybe a better way to word the question would be, if Microsoft can create a keyword like NewID that produces a new result for each new row, why can't they either do the same for Rand() or gives us another keyword (considering that at this point there are mountains of code with back-flips that will get messed up if Rand() were changed,or assuming that Rand() may fit some current standard for SQL)? That way, we wouldn't have to use the currently existing random generator functionality from NewID to make Rand() work the way you would otherwise expect it to work.

Incidentally, another interesting site is Random.org, which generates random numbers based on atmospheric noise.


RAND() probably won't be changed in it's behavior for the exact reason you mention -- backwards-compatibility. A new keyword/function could be a possibility, but they probably won't anytime soon. Unless you have a compelling reason--such as with the introduction of CRYPT_GEN_RANDOM for cryptographic random numbers--there's not much call for randomness in the database. Most uses I've seen for random number generation in SQL boil down to two scenarios: (1) retrieving random rows from a result set, and (2) generating random data to store in the database for testing. Both can be done from the client side or the server-side, with the current functionality, and both tend to be one-off development, testing and troubleshooting requirements as opposed to daily production requirements.

I believe it's Schneier's book where he recommends using subatomic particle decay to generate random numbers Smile
anyoneis
anyoneis
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 39
Jeff Moden (6/9/2009)


RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.


No, it doesn't work at all like most languages. Most people expect RAND() in the result set to generate multiple values when the result set contains multiple rows. It doesn't. In my experience, this functionality is pretty unique.

A minor quibble with the article: the problem is not that RAND() is called multiple times with the same seed, once per row. The problem is that RAND() is only called once per query (just like GETDATE()). Adding the NEWID() call is a hack that forces the query optimizer to call RAND() once per row.

Great article!

David
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search