﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Chris Nowicki  / Random problems / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Thu, 24 May 2012 11:43:15 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]benoit.rocco (6/13/2009)[/b][hr]I was using SQL Server Management Studio, but connected to a SQL2000 server,  it's for that reason the function ROW_NUMBER() didn't exist ^^[/quote]Ah... thank you Benoit.  That would certainly do it.</description><pubDate>Sat, 13 Jun 2009 11:08:22 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>I was using SQL Server Management Studio, but connected to a SQL2000 server,  it's for that reason the function ROW_NUMBER() didn't exist ^^</description><pubDate>Sat, 13 Jun 2009 05:06:02 GMT</pubDate><dc:creator>benoit.rocco</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]benoit.rocco (6/11/2009)[/b][hr]Thank you very much for your answer but I already found the response. I deleted my message but apparently it didn't work.[/quote]Cool.  Tell us what it was.  Two way street here. ;-)</description><pubDate>Thu, 11 Jun 2009 21:19:35 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>Thank you very much for your answer but I already found the response. I deleted my message but apparently it didn't work.</description><pubDate>Thu, 11 Jun 2009 15:08:35 GMT</pubDate><dc:creator>benoit.rocco</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>ROW_NUMBER() was introduced with SQL Server 2005 and need the compatibiity level set to 90 or higher.</description><pubDate>Thu, 11 Jun 2009 05:44:39 GMT</pubDate><dc:creator>SwePeso</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>Hello, I've found this article very interesting but I can't find the Row_Number() function. Is it part of SQL Server 2k8 (I've only 2k5) ?</description><pubDate>Thu, 11 Jun 2009 05:30:52 GMT</pubDate><dc:creator>benoit.rocco</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]christian_nowicki (6/10/2009)[/b][hr][quote][b]Peso (6/9/2009)[/b][hr]And it's also well established thatABS(CHECKSUM(NEWID())) % 100gives random numbers between 0 and 99 with very high, good and even distribution.[/quote]I like this solution - it is simpler than the one I proposed.[/quote]Well done, Chris!  Takes a hell of a good person to say such a thing in the discussion of their own article.  There should be more like you.</description><pubDate>Wed, 10 Jun 2009 21:35:14 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Peso (6/9/2009)[/b][hr]And it's also well established thatABS(CHECKSUM(NEWID())) % 100gives random numbers between 0 and 99 with very high, good and even distribution.[/quote]I like this solution - it is simpler than the one I proposed.</description><pubDate>Wed, 10 Jun 2009 20:58:50 GMT</pubDate><dc:creator>christian_nowicki</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>I like this solution - it is simpler than mine.</description><pubDate>Wed, 10 Jun 2009 20:51:35 GMT</pubDate><dc:creator>christian_nowicki</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]drogers (6/10/2009)[/b][hr]Uh, why would you "not agree with the idea that RAND() doesn't work like NEWID() in T-SQL." Do you find the discrepancy "unexpected?"  :-D[/quote]Heh... Only the first time I ran across it. :-P</description><pubDate>Wed, 10 Jun 2009 11:09:23 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Jeff Moden (6/9/2009)[/b][hr][quote][b]drogers (6/9/2009)[/b][hr][quote][b]Jeff Moden (6/9/2009)[/b][hr]RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.[/quote]No, it doesn't work at all like most languages. Most people expect RAND() in the result set to generate multiple values when the result set contains multiple rows. It doesn't. In my experience, this functionality is pretty unique.[/quote]Stop and think about it... in every other language, you only have control over one row at a time.  In declarative languages like T-SQL, it's a set.  If you use RAND() in a cursor like you would in C# or any other language, you are affecting only one row at a time.That's not to say that I agree with the idea that RAND() doesn't work like NEWID() in T-SQL.[/quote]Uh, why would you "not agree with the idea that RAND() doesn't work like NEWID() in T-SQL." Do you find the discrepancy "unexpected?"  :-D</description><pubDate>Wed, 10 Jun 2009 10:41:48 GMT</pubDate><dc:creator>anyoneis</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Jeff Moden (6/9/2009)[/b][hr][quote][b]Mike C (6/9/2009)[/b][hr][quote][b]Jeff Moden (6/9/2009)[/b][hr][quote][b]Mike C (6/9/2009)[/b][hr]Another minor quibble - each instance of RAND() is invoked once per query.SELECT RAND(), RAND()Generates two different random numbers because RAND() is actually called twice.  Each RAND() result is repeated for every row of the result set (in this case only one row).[/quote]Heh... correct identification of the problem in 90% of the solution.  We know the behavior and the work around.  Use the work around.[/quote]LOL Jeff.  I wrote a chapter about 4 months ago that uses the CHECKSUM(NEWID()) method to generate "random" test data.[/quote]Dang... sorry Mike.  Based on what you wrote, I thought you were going to be one of those folks that didn't get it.  After having read a book of yours, I was really surprised at you response here.  I took it the wrong way.[/quote]Nope, just a minor quibble with the description of RAND() function behavior.</description><pubDate>Wed, 10 Jun 2009 08:05:29 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>It really is this simple:SELECT InvoiceID FROM Invoice ORDER BY NEWID()</description><pubDate>Wed, 10 Jun 2009 08:03:27 GMT</pubDate><dc:creator>JohnSQLServerCentral</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Mike C (6/9/2009)[/b][hr][quote][b]Jeff Moden (6/9/2009)[/b][hr][quote][b]Mike C (6/9/2009)[/b][hr]Another minor quibble - each instance of RAND() is invoked once per query.SELECT RAND(), RAND()Generates two different random numbers because RAND() is actually called twice.  Each RAND() result is repeated for every row of the result set (in this case only one row).[/quote]Heh... correct identification of the problem in 90% of the solution.  We know the behavior and the work around.  Use the work around.[/quote]LOL Jeff.  I wrote a chapter about 4 months ago that uses the CHECKSUM(NEWID()) method to generate "random" test data.[/quote]Dang... sorry Mike.  Based on what you wrote, I thought you were going to be one of those folks that didn't get it.  After having read a book of yours, I was really surprised at you response here.  I took it the wrong way.</description><pubDate>Tue, 09 Jun 2009 22:45:32 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Jeff Moden (6/9/2009)[/b][hr][quote][b]Mike C (6/9/2009)[/b][hr]Another minor quibble - each instance of RAND() is invoked once per query.SELECT RAND(), RAND()Generates two different random numbers because RAND() is actually called twice.  Each RAND() result is repeated for every row of the result set (in this case only one row).[/quote]Heh... correct identification of the problem in 90% of the solution.  We know the behavior and the work around.  Use the work around.[/quote]LOL Jeff.  I wrote a chapter about 4 months ago that uses the CHECKSUM(NEWID()) method to generate "random" test data.</description><pubDate>Tue, 09 Jun 2009 22:17:53 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Mike C (6/9/2009)[/b][hr][quote][b]drogers (6/9/2009)[/b]A minor quibble with the article: the problem is not that RAND() is called multiple times with the same seed, once per row. The problem is that RAND() is only called once per [b]query[/b] (just like GETDATE()).[/quote]Another minor quibble - each instance of RAND() is invoked once per query.SELECT RAND(), RAND()Generates two different random numbers because RAND() is actually called twice.  Each RAND() result is repeated for every row of the result set (in this case only one row).[/quote]Heh... correct identification of the problem in 90% of the solution.  We know the behavior and the work around.  Use the work around.</description><pubDate>Tue, 09 Jun 2009 19:02:21 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]drogers (6/9/2009)[/b][hr][quote][b]Jeff Moden (6/9/2009)[/b][hr]RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.[/quote]No, it doesn't work at all like most languages. Most people expect RAND() in the result set to generate multiple values when the result set contains multiple rows. It doesn't. In my experience, this functionality is pretty unique.A minor quibble with the article: the problem is not that RAND() is called multiple times with the same seed, once per row. The problem is that RAND() is only called once per [b]query[/b] (just like GETDATE()). Adding the NEWID() call is a [b]hack [/b]that forces the query optimizer to call RAND() once per row.Great article!David[/quote]Stop and think about it... in every other language, you only have control over one row at a time.  In declarative languages like T-SQL, it's a set.  If you use RAND() in a cursor like you would in C# or any other language, you are affecting only one row at a time.That's not to say that I agree with the idea that RAND() doesn't work like NEWID() in T-SQL.</description><pubDate>Tue, 09 Jun 2009 18:59:19 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]drogers (6/9/2009)[/b]A minor quibble with the article: the problem is not that RAND() is called multiple times with the same seed, once per row. The problem is that RAND() is only called once per [b]query[/b] (just like GETDATE()).[/quote]Another minor quibble - each instance of RAND() is invoked once per query.SELECT RAND(), RAND()Generates two different random numbers because RAND() is actually called twice.  Each RAND() result is repeated for every row of the result set (in this case only one row).</description><pubDate>Tue, 09 Jun 2009 17:47:34 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Jeff Moden (6/9/2009)[/b][hr]RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.[/quote]No, it doesn't work at all like most languages. Most people expect RAND() in the result set to generate multiple values when the result set contains multiple rows. It doesn't. In my experience, this functionality is pretty unique.A minor quibble with the article: the problem is not that RAND() is called multiple times with the same seed, once per row. The problem is that RAND() is only called once per [b]query[/b] (just like GETDATE()). Adding the NEWID() call is a [b]hack [/b]that forces the query optimizer to call RAND() once per row.Great article!David</description><pubDate>Tue, 09 Jun 2009 17:34:48 GMT</pubDate><dc:creator>anyoneis</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Someguy (6/9/2009)[/b][hr]Mike C.Thank you for your answer to a slightly lazy question - Yes, I should have simply run the query SELECT Rand() and pushed the start button several times and I would have seen that for single runs, Rand defaults to 'no seed required.'Although in terms of this article, it seems like the generation of multiple rows, each with its own random number is not an outrageous requirement in the programming world. This is especially true considering the fact that SQL is meant to call sets as well as individual results. And yes, I understand that most systems don't produce truly random numbers but hard-coded sequences, so 'random' keywords in programming do are not really random, but practically useful for most instances.Maybe a better way to word the question would be, if Microsoft can create a keyword like NewID that produces a new result for each new row, why can't they either do the same for Rand() or gives us another keyword (considering that at this point there are mountains of code with back-flips that will get messed up if Rand() were changed,or assuming that Rand() may fit some current standard for SQL)? That way, we wouldn't have to use the currently existing random generator functionality from NewID to make Rand() work the way you would otherwise expect it to work.Incidentally, another interesting site is Random.org, which generates random numbers based on atmospheric noise.[/quote]RAND() probably won't be changed in it's behavior for the exact reason you mention -- backwards-compatibility.  A new keyword/function could be a possibility, but they probably won't anytime soon.  Unless you have a compelling reason--such as with the introduction of CRYPT_GEN_RANDOM for cryptographic random numbers--there's not much call for randomness in the database.  Most uses I've seen for random number generation in SQL boil down to two scenarios: (1) retrieving random rows from a result set, and (2) generating random data to store in the database for testing.  Both can be done from the client side or the server-side, with the current functionality, and both tend to be one-off development, testing and troubleshooting requirements as opposed to daily production requirements.I believe it's Schneier's book where he recommends using subatomic particle decay to generate random numbers :)</description><pubDate>Tue, 09 Jun 2009 16:19:34 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]sqlservercentral (6/9/2009)[/b][hr]I don't know the properties of newid, and any function on an unknown source of entropy can cause problems if you'll looking for true randomnous rather than just a simple way subset records.Consider using (convert(int ,CRYPT_GEN_RANDOM(4))/21474836467) as the rng.CRYPT_GEN_RANDOM(Number of bytes) yields a hex number using the CAPI cryptologically secure random number generator--not true random, but having excellent properties in distribution and in not being able to reverse the algorithm (unlike rand's linear congruent generator)[/quote]Cryptographically secure random number generators are notoriously slow.  CRYPT_GEN_RANDOM is available only on SQL 2008.</description><pubDate>Tue, 09 Jun 2009 16:13:59 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Jeff Moden (6/9/2009)[/b][hr][quote][b]Mike C (6/9/2009)[/b][hr]In T-SQL you can use RAND() without a seed.  If you want better random numbers check out one of my favorite random number generators with very nice statistical properties: Mersenne Twister at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html.  Most of the time in SQL though, the NEWID() function is "good enough" since we are usually just talking about quick and practical methods of sampling data in a non-orderly fashion.BTW, the "backflips" are a result of two factors: (1) the definition of the RAND() function, which generates one value per invocation, and (2) the fact that SQL is a declarative language.  NEWID() guarantees a different value for every row of a result set, RAND() does not provide that; hence the issue the author is trying to get around.[/quote]RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.[/quote]One row at a time, Jeff?</description><pubDate>Tue, 09 Jun 2009 16:11:52 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>Mike C.Thank you for your answer to a slightly lazy question - Yes, I should have simply run the query SELECT Rand() and pushed the start button several times and I would have seen that for single runs, Rand defaults to 'no seed required.'Although in terms of this article, it seems like the generation of multiple rows, each with its own random number is not an outrageous requirement in the programming world. This is especially true considering the fact that SQL is meant to call sets as well as individual results. And yes, I understand that most systems don't produce truly random numbers but hard-coded sequences, so 'random' keywords in programming do are not really random, but practically useful for most instances.Maybe a better way to word the question would be, if Microsoft can create a keyword like NewID that produces a new result for each new row, why can't they either do the same for Rand() or gives us another keyword (considering that at this point there are mountains of code with back-flips that will get messed up if Rand() were changed,or assuming that Rand() may fit some current standard for SQL)? That way, we wouldn't have to use the currently existing random generator functionality from NewID to make Rand() work the way you would otherwise expect it to work.Incidentally, another interesting site is Random.org, which generates random numbers based on atmospheric noise.</description><pubDate>Tue, 09 Jun 2009 14:16:40 GMT</pubDate><dc:creator>Someguy</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>I cheated.[code]select cast( newid() as binary(4) )[/code]:-P</description><pubDate>Tue, 09 Jun 2009 13:32:39 GMT</pubDate><dc:creator>Sean Terry</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>Unlike your other readers, I tried to find the ugliest solution.Here it is:select 	InvoiceID,		RandNumber = cast(rand(cast(right(replace(convert(varchar(24), dateadd(minute, Row_Number() OVER (Order by InvoiceID) , getdate()), 113), ':', ''), 9) as int)) * 100 + 1 as int)from	[Invoice]The idea is: use time as the seed, but add "rownumber" of minutes to the current time for each successive row. The string manipulation functions are just a way to extract the hours + minutes + milliseconds portion of the time.</description><pubDate>Tue, 09 Jun 2009 13:20:41 GMT</pubDate><dc:creator>pvoutov</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>Heh... in that case, no.  NEWID() is random enough as a seed.</description><pubDate>Tue, 09 Jun 2009 12:18:35 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>I don't know the properties of newid, and any function on an unknown source of entropy can cause problems if you'll looking for true randomnous rather than just a simple way subset records.Consider using (convert(int ,CRYPT_GEN_RANDOM(4))/21474836467) as the rng.CRYPT_GEN_RANDOM(Number of bytes) yields a hex number using the CAPI cryptologically secure random number generator--not true random, but having excellent properties in distribution and in not being able to reverse the algorithm (unlike rand's linear congruent generator)</description><pubDate>Tue, 09 Jun 2009 12:07:53 GMT</pubDate><dc:creator>DannyS</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Mike C (6/9/2009)[/b][hr][quote][b]Someguy (6/9/2009)[/b][hr]OK. I'll be the one to ask the dumb question:Most languages have a random function that does not require you to add seeds and/or do other things to make them work. In C# for example, you can choose to add a seed, but the default doesn't require one.So, why does T-SQL default to requiring a seed, seeing as doing so creates 'scratch head' situations like this? It sure seems like we're doing some big back flips here to achieve something that's pretty common in programming tasks. We're using functions like 'NewID' to give us the random number that Random can't do without. And technically, the guid generated by NewID is not totally random. For practical matters like this excercise, sure. But Random and Unique aren't necesarily the same things. But I digress...[/quote]In T-SQL you can use RAND() without a seed.  If you want better random numbers check out one of my favorite random number generators with very nice statistical properties: Mersenne Twister at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html.  Most of the time in SQL though, the NEWID() function is "good enough" since we are usually just talking about quick and practical methods of sampling data in a non-orderly fashion.BTW, the "backflips" are a result of two factors: (1) the definition of the RAND() function, which generates one value per invocation, and (2) the fact that SQL is a declarative language.  NEWID() guarantees a different value for every row of a result set, RAND() does not provide that; hence the issue the author is trying to get around.[/quote]RAND() in T-SQL works exactly the same way as RAND() in any computer language.... one row or value at a time.</description><pubDate>Tue, 09 Jun 2009 12:07:27 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Someguy (6/9/2009)[/b][hr]OK. I'll be the one to ask the dumb question:Most languages have a random function that does not require you to add seeds and/or do other things to make them work. In C# for example, you can choose to add a seed, but the default doesn't require one.So, why does T-SQL default to requiring a seed, seeing as doing so creates 'scratch head' situations like this? It sure seems like we're doing some big back flips here to achieve something that's pretty common in programming tasks. We're using functions like 'NewID' to give us the random number that Random can't do without. And technically, the guid generated by NewID is not totally random. For practical matters like this excercise, sure. But Random and Unique aren't necesarily the same things. But I digress...[/quote]In T-SQL you can use RAND() without a seed.  If you want better random numbers check out one of my favorite random number generators with very nice statistical properties: Mersenne Twister at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html.  Most of the time in SQL though, the NEWID() function is "good enough" since we are usually just talking about quick and practical methods of sampling data in a non-orderly fashion.BTW, the "backflips" are a result of two factors: (1) the definition of the RAND() function, which generates one value per invocation, and (2) the fact that SQL is a declarative language.  NEWID() guarantees a different value for every row of a result set, RAND() does not provide that; hence the issue the author is trying to get around.</description><pubDate>Tue, 09 Jun 2009 11:30:23 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]timothyawiseman (6/9/2009)[/b][hr]Good article, Chris.If you truly need it to be random then the method you have is probably the way to go, but if you just want 20% of the table for sample data then the SampleTable command in the from clause can be useful.  It certainly has limitations and is absolutely not applicable in a situation like you described, but it can be useful and faster way of extracting somewhat randomized sample data for things like testing.[/quote]Just for everyone else who doesn't know of the limitations of the TABLESAMPLE clause you mentioned - TABLESAMPLE samples an approximate specified percentage of physical data pages, not individual rows.  For small tables (small number of pages) this often results in an all-or-nothing result.  TABLESAMPLE is best used to return samples for large tables (lots of pages).  If you want to "randomly" sample a specific number or % of rows, the TOP(...) ... ORDER BY NEWID() format works well.  If you want to actually generate random numbers, CHECKSUM(NEWID()) is handy.  In fact, it's the only thing I would recommend using CHECKSUM for...</description><pubDate>Tue, 09 Jun 2009 11:22:44 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>OK. I'll be the one to ask the dumb question:Most languages have a random function that does not require you to add seeds and/or do other things to make them work. In C# for example, you can choose to add a seed, but the default doesn't require one.So, why does T-SQL default to requiring a seed, seeing as doing so creates 'scratch head' situations like this? It sure seems like we're doing some big back flips here to achieve something that's pretty common in programming tasks. Because of this situation, we're reduced to using functions like 'NewID' to give us the random number that Random can't do without. And technically, the guid generated by NewID is not totally random. For practical matters like this excercise, sure. But Random and Unique aren't necesarily the same things. But I digress...</description><pubDate>Tue, 09 Jun 2009 10:54:57 GMT</pubDate><dc:creator>Someguy</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>out of curiosity I checked distribution of random numbers (1-5) on a recordset of 1'000'000 rows:1 - 20.161%2 - 20.108%3 - 20.137%4 - 19.991%5 - 19.603%</description><pubDate>Tue, 09 Jun 2009 10:48:07 GMT</pubDate><dc:creator>achistov</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>Good article, Chris.If you truly need it to be random then the method you have is probably the way to go, but if you just want 20% of the table for sample data then the SampleTable command in the from clause can be useful.  It certainly has limitations and is absolutely not applicable in a situation like you described, but it can be useful and faster way of extracting somewhat randomized sample data for things like testing.</description><pubDate>Tue, 09 Jun 2009 10:42:31 GMT</pubDate><dc:creator>timothyawiseman</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]vzurkowski (6/9/2009)[/b][hr]Exactly!NEWID() creates a unique identifier.Is it randomly distributed over as set of charater strings of certain length, or something like that? Otherwise, it is not clear RAND(checksum(newid())) is unformly distributed (although your earlier post gave some reassurance) .SELECT d.N, COUNT(*), perc=cast(count(*) as float)/10000   FROM (         SELECT TOP 1000000               floor(5*( RAND(CHECKSUM(NEWID()))+RAND(CHECKSUM(NEWID())))) AS N           FROM Master.dbo.SysColumns sc1          CROSS JOIN Master.dbo.SysColumns sc2        )d  GROUP BY d.N  ORDER BY d.Nproduces "random" numbers, but not uniformly distributed.[/quote]No.... that code is like rolling dice.  It will produce a bell curve near the center of the combined range just like dice do at 6 and 7</description><pubDate>Tue, 09 Jun 2009 09:55:23 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>Exactly!NEWID() creates a unique identifier.Is it randomly distributed over as set of charater strings of certain length, or something like that? Otherwise, it is not clear RAND(checksum(newid())) is unformly distributed (although your earlier post gave some reassurance) .SELECT d.N, COUNT(*), perc=cast(count(*) as float)/10000   FROM (         SELECT TOP 1000000               floor(5*( RAND(CHECKSUM(NEWID()))+RAND(CHECKSUM(NEWID())))) AS N           FROM Master.dbo.SysColumns sc1          CROSS JOIN Master.dbo.SysColumns sc2        )d  GROUP BY d.N  ORDER BY d.Nproduces "random" numbers, but not uniformly distributed.</description><pubDate>Tue, 09 Jun 2009 09:49:00 GMT</pubDate><dc:creator>vzurkowski</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>Heh... nope... formatting not fixed.  The code window removed blank lines and when you copy from the code window and paste into SSMS using IE, all the leading spaces go away and most lines end up being double spaced.  I've included that result below the code &amp;#119;indow.[code]DROP TABLE JBMTestGO--===== Create and populate a 1,000,000 row test table.     -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers     -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers     -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings     -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers     -- Column "SomeDate" has a range of  &gt;=01/01/2000 and &lt;01/01/2010 non-unique date/times     -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'     --        for all rows.     -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)     --Jeff Moden SELECT TOP 1000000        RowNum       = IDENTITY(INT,1,1),        SomeInt      = ABS(CHECKSUM(NEWID()))%50000+1,        SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)                     + CHAR(ABS(CHECKSUM(NEWID()))%26+65),        SomeCSV      = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),        SomeMoney    = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),        SomeDate     = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),        SomeHex12    = RIGHT(NEWID(),12)   INTO dbo.JBMTest   FROM Master.dbo.SysColumns t1,        Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN--===== A table is not properly formed unless a Primary Key has been assigned     -- Takes about 1 second to execute.  ALTER TABLE dbo.JBMTest        ADD PRIMARY KEY CLUSTERED (RowNum)[/code][font="Courier New"]DROP TABLE JBMTestGO--===== Create and populate a 1,000,000 row test table.     -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers     -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers     -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings     -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers     -- Column "SomeDate" has a range of  &gt;=01/01/2000 and &lt;01/01/2010 non-unique date/times     -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'     --        for all rows.     -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)     --Jeff Moden SELECT TOP 1000000        RowNum       = IDENTITY(INT,1,1),        SomeInt      = ABS(CHECKSUM(NEWID()))%50000+1,        SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)                     + CHAR(ABS(CHECKSUM(NEWID()))%26+65),        SomeCSV      = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),        SomeMoney    = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),        SomeDate     = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),        SomeHex12    = RIGHT(NEWID(),12)   INTO dbo.JBMTest   FROM Master.dbo.SysColumns t1,        Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN--===== A table is not properly formed unless a Primary Key has been assigned     -- Takes about 1 second to execute.  ALTER TABLE dbo.JBMTest        ADD PRIMARY KEY CLUSTERED (RowNum)[/font]</description><pubDate>Tue, 09 Jun 2009 09:47:46 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]Steve Jones - Editor (6/9/2009)[/b][hr]The formatting is fixed, or should be. Please let me know if it is still messed up.[/quote]That's GREAT!  Heh... you know I'll be testing the heck out of that for you.  Tell the Developers thanks either way.</description><pubDate>Tue, 09 Jun 2009 09:40:51 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>[quote][b]woolfson (6/9/2009)[/b][hr]This is brilliant; I have always struggled with providing a unique identifier to a set of data after the data has already been written to the table, and it's always been a challenge, especially when doing it within SQL.  Well, good job, very good job.  Thanks.[/quote]Ummmm... no.  Randomness &lt;&gt; Uniqueness.  Unless you're talking about just using NEWID(), that is.</description><pubDate>Tue, 09 Jun 2009 09:39:40 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>This is brilliant; I have always struggled with providing a unique identifier to a set of data after the data has already been written to the table, and it's always been a challenge, especially when doing it within SQL.  Well, good job, very good job.  Thanks.</description><pubDate>Tue, 09 Jun 2009 09:29:56 GMT</pubDate><dc:creator>woolfson</dc:creator></item><item><title>RE: Random problems</title><link>http://www.sqlservercentral.com/Forums/Topic731102-1562-1.aspx</link><description>The formatting is fixed, or should be. Please let me know if it is still messed up.I have seen some strangeness with randomness in SQL Server, running the same query on restart, etc. However NewID() seems to work well.</description><pubDate>Tue, 09 Jun 2009 09:20:24 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item></channel></rss>
