SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Random problems


Random problems

Author
Message
DannyS
DannyS
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 57
I don't know the properties of newid, and any function on an unknown source of entropy can cause problems if you'll looking for true randomnous rather than just a simple way subset records.

Consider using (convert(int ,CRYPT_GEN_RANDOM(4))/21474836467) as the rng.

CRYPT_GEN_RANDOM(Number of bytes) yields a hex number using the CAPI cryptologically secure random number generator--not true random, but having excellent properties in distribution and in not being able to reverse the algorithm (unlike rand's linear congruent generator)
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)

Group: General Forum Members
Points: 88488 Visits: 41130
Heh... in that case, no. NEWID() is random enough as a seed.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
voutmaster
voutmaster
Valued Member
Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)

Group: General Forum Members
Points: 67 Visits: 34
Unlike your other readers, I tried to find the ugliest solution.

Here it is:

select
InvoiceID,
RandNumber = cast(rand(cast(right(replace(convert(varchar(24), dateadd(minute, Row_Number() OVER (Order by InvoiceID) , getdate()), 113), ':', ''), 9) as int)) * 100 + 1 as int)
from
[Invoice]

The idea is: use time as the seed, but add "rownumber" of minutes to the current time for each successive row. The string manipulation functions are just a way to extract the hours + minutes + milliseconds portion of the time.
Sean Terry
Sean Terry
Valued Member
Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)

Group: General Forum Members
Points: 59 Visits: 347
I cheated.


select cast( newid() as binary(4) )



:-P
Someguy
Someguy
SSC Veteran
SSC Veteran (253 reputation)SSC Veteran (253 reputation)SSC Veteran (253 reputation)SSC Veteran (253 reputation)SSC Veteran (253 reputation)SSC Veteran (253 reputation)SSC Veteran (253 reputation)SSC Veteran (253 reputation)

Group: General Forum Members
Points: 253 Visits: 579
Mike C.

Thank you for your answer to a slightly lazy question - Yes, I should have simply run the query SELECT Rand() and pushed the start button several times and I would have seen that for single runs, Rand defaults to 'no seed required.'

Although in terms of this article, it seems like the generation of multiple rows, each with its own random number is not an outrageous requirement in the programming world. This is especially true considering the fact that SQL is meant to call sets as well as individual results. And yes, I understand that most systems don't produce truly random numbers but hard-coded sequences, so 'random' keywords in programming do are not really random, but practically useful for most instances.

Maybe a better way to word the question would be, if Microsoft can create a keyword like NewID that produces a new result for each new row, why can't they either do the same for Rand() or gives us another keyword (considering that at this point there are mountains of code with back-flips that will get messed up if Rand() were changed,or assuming that Rand() may fit some current standard for SQL)? That way, we wouldn't have to use the currently existing random generator functionality from NewID to make Rand() work the way you would otherwise expect it to work.

Incidentally, another interesting site is Random.org, which generates random numbers based on atmospheric noise.

___________________________________________________
“Politicians are like diapers. They both need changing regularly and for the same reason.”
Mike C
Mike C
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2725 Visits: 1168
Jeff Moden (6/9/2009)
Mike C (6/9/2009)

In T-SQL you can use RAND() without a seed. If you want better random numbers check out one of my favorite random number generators with very nice statistical properties: Mersenne Twister at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html. Most of the time in SQL though, the NEWID() function is "good enough" since we are usually just talking about quick and practical methods of sampling data in a non-orderly fashion.

BTW, the "backflips" are a result of two factors: (1) the definition of the RAND() function, which generates one value per invocation, and (2) the fact that SQL is a declarative language. NEWID() guarantees a different value for every row of a result set, RAND() does not provide that; hence the issue the author is trying to get around.



RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.


One row at a time, Jeff?
Mike C
Mike C
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2725 Visits: 1168
sqlservercentral (6/9/2009)
I don't know the properties of newid, and any function on an unknown source of entropy can cause problems if you'll looking for true randomnous rather than just a simple way subset records.

Consider using (convert(int ,CRYPT_GEN_RANDOM(4))/21474836467) as the rng.

CRYPT_GEN_RANDOM(Number of bytes) yields a hex number using the CAPI cryptologically secure random number generator--not true random, but having excellent properties in distribution and in not being able to reverse the algorithm (unlike rand's linear congruent generator)


Cryptographically secure random number generators are notoriously slow. CRYPT_GEN_RANDOM is available only on SQL 2008.
Mike C
Mike C
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2725 Visits: 1168
Someguy (6/9/2009)
Mike C.

Thank you for your answer to a slightly lazy question - Yes, I should have simply run the query SELECT Rand() and pushed the start button several times and I would have seen that for single runs, Rand defaults to 'no seed required.'

Although in terms of this article, it seems like the generation of multiple rows, each with its own random number is not an outrageous requirement in the programming world. This is especially true considering the fact that SQL is meant to call sets as well as individual results. And yes, I understand that most systems don't produce truly random numbers but hard-coded sequences, so 'random' keywords in programming do are not really random, but practically useful for most instances.

Maybe a better way to word the question would be, if Microsoft can create a keyword like NewID that produces a new result for each new row, why can't they either do the same for Rand() or gives us another keyword (considering that at this point there are mountains of code with back-flips that will get messed up if Rand() were changed,or assuming that Rand() may fit some current standard for SQL)? That way, we wouldn't have to use the currently existing random generator functionality from NewID to make Rand() work the way you would otherwise expect it to work.

Incidentally, another interesting site is Random.org, which generates random numbers based on atmospheric noise.


RAND() probably won't be changed in it's behavior for the exact reason you mention -- backwards-compatibility. A new keyword/function could be a possibility, but they probably won't anytime soon. Unless you have a compelling reason--such as with the introduction of CRYPT_GEN_RANDOM for cryptographic random numbers--there's not much call for randomness in the database. Most uses I've seen for random number generation in SQL boil down to two scenarios: (1) retrieving random rows from a result set, and (2) generating random data to store in the database for testing. Both can be done from the client side or the server-side, with the current functionality, and both tend to be one-off development, testing and troubleshooting requirements as opposed to daily production requirements.

I believe it's Schneier's book where he recommends using subatomic particle decay to generate random numbers Smile
anyoneis
anyoneis
Grasshopper
Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)Grasshopper (11 reputation)

Group: General Forum Members
Points: 11 Visits: 39
Jeff Moden (6/9/2009)


RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.


No, it doesn't work at all like most languages. Most people expect RAND() in the result set to generate multiple values when the result set contains multiple rows. It doesn't. In my experience, this functionality is pretty unique.

A minor quibble with the article: the problem is not that RAND() is called multiple times with the same seed, once per row. The problem is that RAND() is only called once per query (just like GETDATE()). Adding the NEWID() call is a hack that forces the query optimizer to call RAND() once per row.

Great article!

David
Mike C
Mike C
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2725 Visits: 1168
drogers (6/9/2009)

A minor quibble with the article: the problem is not that RAND() is called multiple times with the same seed, once per row. The problem is that RAND() is only called once per query (just like GETDATE()).



Another minor quibble - each instance of RAND() is invoked once per query.

SELECT RAND(), RAND()

Generates two different random numbers because RAND() is actually called twice. Each RAND() result is repeated for every row of the result set (in this case only one row).
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search