﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Discuss Content Posted by Jeff Moden / Article Discussions / Article Discussions by Author  / Generating Test Data: Part 1 - Generating Random Integers and Floats / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Wed, 19 Jun 2013 23:26:56 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>My pleasure and thank you for the kind feedback, Dwaine.</description><pubDate>Fri, 08 Jun 2012 12:20:18 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Jeff Moden (6/7/2012)[/b][hr]And, just to be sure... RAND does NOT return the closed interval of {0,1}.  It's a closed-open where 0 materializes but 1 never does.  As Dwain has shown there are a couple of ways to handle it depending on what you want to do.  The key in the article was to make every keenly aware that while constraining such numbers by range and domain is quite similar to the integer method, you have to account for the whole number-line between the integers instead of just the integers.[/quote]Guess I need to spend more time with my nose firmly planted in BOL.Thanks Jeff, yet again, for teaching me something!</description><pubDate>Thu, 07 Jun 2012 18:39:20 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>And, just to be sure... RAND does NOT return the closed interval of {0,1}.  It's a closed-open where 0 materializes but 1 never does.  As Dwain has shown there are a couple of ways to handle it depending on what you want to do.  The key in the article was to make every keenly aware that while constraining such numbers by range and domain is quite similar to the integer method, you have to account for the whole number-line between the integers instead of just the integers.</description><pubDate>Thu, 07 Jun 2012 06:47:31 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]paul.knibbs (6/7/2012)[/b][hr]dwain, Jeff said this was the expected behaviour in the article:The following code generates a million rows of random integers (the SomeRandomInteger column) with possible values from 400 to 500 and float values (the SomeRandomFloat column) from 400 to 500.999999999999 in just over 4 seconds on my 10 year old, single CPU computer.[/quote]Ooops.  Must'a missed it.  Anyway it is clear now.  Thanks.</description><pubDate>Thu, 07 Jun 2012 03:57:34 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>dwain, Jeff said this was the expected behaviour in the article:The following code generates a million rows of random integers (the SomeRandomInteger column) with possible values from 400 to 500 and float values (the SomeRandomFloat column) from 400 to 500.999999999999 in just over 4 seconds on my 10 year old, single CPU computer.</description><pubDate>Thu, 07 Jun 2012 01:12:55 GMT</pubDate><dc:creator>paul.knibbs</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Hey Jeff!Using your test harness yesterday, I ran into some unexpected (for me) results.  Let me explain:[code="sql"]--===== Declare some obviously named variablesDECLARE @NumberOfRows INT,        @StartValue   INT,        @EndValue     INT,        @Range        INT;--===== Preset the variables to known values SELECT @NumberOfRows = 1000000,        @StartValue   = 400,        @EndValue     = 500,        @Range        = @EndValue - @StartValue + 1;--===== Conditionally drop the test table to make reruns easier in SSMS     IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL        DROP TABLE #SomeTestTable;--===== Create the test table with "random constrained" integers and floats     -- within the parameters identified in the variables above. SELECT TOP (@NumberOfRows)        SomeRandomInteger =  ABS(CHECKSUM(NEWID())) % @Range + @StartValue,        SomeRandomFloat   = RAND(CHECKSUM(NEWID())) * @Range + @StartValue   INTO #SomeTestTable   FROM sys.all_columns ac1  CROSS JOIN sys.all_columns ac2SELECT MinFloat=MIN(SomeRandomFloat), MaxFloat=MAX(SomeRandomFloat)FROM #SomeTestTableDROP TABLE #SomeTestTable[/code]This returns the following:[code="plain"]MinFloat           MaxFloat400.000012322329   500.999933381006[/code]My surprise was due to the fact that RAND() returns a random floating point number on the closed interval {0,1}, whereas this approach yielded some random floats outside the interval {400,500}.  Clearly this approach is suitable when you are interested in applying the FLOAT to a date range, such as the following:[code="sql"]--===== Declare some obviously named variablesDECLARE @NumberOfRows INT,        @StartValue   INT,        @EndValue     INT,        @Range        INT,        @FStartValue  INT,        @FEndValue    INT,        @FRange       INT;--===== Preset the variables to known values SELECT @NumberOfRows = 1000000,        @StartValue   = 400,        @EndValue     = 500,        @Range        = @EndValue - @StartValue + 1,        @FStartValue  = 400,        @FEndValue    = 500,        @FRange       = @EndValue - @StartValue;--===== Conditionally drop the test table to make reruns easier in SSMS     IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL        DROP TABLE #SomeTestTable;--===== Create the test table with "random constrained" integers and floats     -- within the parameters identified in the variables above. SELECT TOP (@NumberOfRows)        SomeRandomInteger =  ABS(CHECKSUM(NEWID())) % @Range  + @StartValue,        SomeRandomFloat   = RAND(CHECKSUM(NEWID())) * @FRange + @FStartValue,        SomeRandomDate    = RAND(CHECKSUM(NEWID())) * @Range  +             DATEADD(day, -@StartValue, DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0))   INTO #SomeTestTable   FROM sys.all_columns ac1  CROSS JOIN sys.all_columns ac2SELECT MinFloat=MIN(SomeRandomFloat), MaxFloat=MAX(SomeRandomFloat)    ,MinDate=MIN(SomeRandomDate), MaxDate=MAX(SomeRandomDate)FROM #SomeTestTableDROP TABLE #SomeTestTable[/code]Which delivers these results:[code="plain"]MinFloat           MaxFloat                    MinDate                    MaxDate400.000069523232   499.999990152787            2011-05-04 00:00:16.187    2011-08-12 23:59:58.753[/code]I have taken the liberty to change the formula for SomeRandomFloat to deliver results on the closed interval {400,500}, which is what I was expecting.  Date results end up being in the range of GETDATE() - 400 days and spans 100 days.Mind you, I'm not reporting a bug.  I'd prefer to think of it as an unexplained feature. :-)Let me know if I've misinterpreted something.</description><pubDate>Wed, 06 Jun 2012 18:49:04 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Well done and thanks for the feedback, Dev.Yeah... this article will give me a place to point to if someone comes up with the question on one of my posts.  Hopefully, that's not all it's good for.  Hopefully more people will do their own testing before they say such things as "In my experience, the best method is..." or "I've been at this for 25 years and the fastest method is..." or things like "Recursion rocks for everything!" because I've found that they're frequently wrong.  Help destroy SQL Myths!  Performance test some code today!!! ;-)</description><pubDate>Sun, 20 May 2012 15:41:11 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>I was almost disconnected with SSC for couple of months and got a chance to read the article today. Nice work!!!I remember when I read one of your posts (1 million rows test) in a discussion; I found it difficult to understand how these 4 lines of code actually work. I followed classic divide &amp; conquer technique to understand it (abs | checksum | newid | top | cross join). I am sure many of the readers might have lost their interest there as long as it served their purpose.  This explanation will keep them connected to your articles. Thanks for the article!!!</description><pubDate>Sun, 20 May 2012 04:01:05 GMT</pubDate><dc:creator>Dev</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]mishaluba (4/21/2012)[/b][hr]Another brilliant article from Jeff.  Easy and fun read (not very often you can say this about a technical text).  Can't wait for Part 2.[/quote]Thanks, Mishaluba.  I had gotten into a bit of a rut on this one and the folks I asked to review it for me (I knew I was in a rut) gave me some excellent suggestions to keep it from being monotonous.  It's a tough thing to write something so technical and not be totally boring.Thank you for the nice feedback on the article.</description><pubDate>Sun, 22 Apr 2012 08:52:32 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Another brilliant article from Jeff.  Easy and fun read (not very often you can say this about a technical text).  Can't wait for Part 2.</description><pubDate>Sat, 21 Apr 2012 07:10:43 GMT</pubDate><dc:creator>mishaluba</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Jeff Moden (3/30/2012)[/b][hr]Cool... 5 birds with 1 stone.  SPAM reported.  ;-)[/quote]Well, strictly speaking it's the same bird 5 times...and what is it with this thread and spam, anyway? You must be particularly attractive to them, Jeff. :-D</description><pubDate>Sun, 01 Apr 2012 14:12:59 GMT</pubDate><dc:creator>paul.knibbs</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Cool... 5 birds with 1 stone.  SPAM reported.  ;-)</description><pubDate>Fri, 30 Mar 2012 21:55:16 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Playing catchup on being underwater in a project for the last few weeks.  Article looks great Jeff, and I'm sure will be very valuable.  Nice Work!</description><pubDate>Fri, 30 Mar 2012 15:43:25 GMT</pubDate><dc:creator>Evil Kraig F</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Spam reported.  And, no... $33 for a pair of jeans is no way to thank anyone.</description><pubDate>Thu, 29 Mar 2012 21:23:52 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote]dwain.c (3/26/2012)--------------------------------------------------------------------------------Jeff,The following thought occurred to me last night but I didn't get a chance to test it until this morning.Isn't the modulo function designed to always return a positive integer?Hence, in this part of your data generator, I don't believe you need to use ABS:SomeRandomInteger =  CHECKSUM(NEWID()) % @Range + @StartValueI didn't try it for a million rows but the first 200 came up all as positives.Apologies, Dwain. I'm not sure how I missed your question on this. [/quote]I almost wish you had missed it because obviously my thinking cap wasn't on when I posted it.There still might be a way to use the negative return value to avoid the ABS by adjusting the range.  I'm not sure the complication is worth investigating though.</description><pubDate>Tue, 27 Mar 2012 19:29:19 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]ALZDBA (3/26/2012)[/b][hr]Great extrapolation of the [url=http://en.wikipedia.org/wiki/KISS_principle]KISS principle[/url], Jeff.Need it to be sead ... [b][font="Times New Roman"]I LOVE IT[/font][/b]  :w00t:[/quote]Thanks for the feedback, Johan.  I agree... the simpler, the better.</description><pubDate>Tue, 27 Mar 2012 19:29:03 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Scott Abrants (3/26/2012)[/b][hr]Excellent post!  Great examples, great code, and easy to follow!  Nice job Jeff![/quote]Thanks for the feedback, Scott.  I appreciate you stopping by.</description><pubDate>Tue, 27 Mar 2012 19:27:16 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]dwain.c (3/26/2012)[/b][hr]Jeff,The following thought occurred to me last night but I didn't get a chance to test it until this morning.Isn't the modulo function designed to always return a positive integer?Hence, in this part of your data generator, I don't believe you need to use ABS:[code="sql"]SomeRandomInteger =  CHECKSUM(NEWID()) % @Range + @StartValue[/code]I didn't try it for a million rows but the first 200 came up all as positives.[/quote]Apologies, Dwain.  I'm not sure how I missed your question on this.As Michael demonstrated, Modulo will return a negative number if the "Dividend" of the division is negative.[code="sql"]SELECT -1111 % 100,        1111 %-100,       -1111 %-100[/code][code="plain"]----------- ----------- ------------11         11          -11(1 row(s) affected)[/code]</description><pubDate>Tue, 27 Mar 2012 19:11:50 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Michael Valentine Jones (3/27/2012)[/b][hr][quote][b]dwain.c (3/26/2012)[/b][hr]Jeff,The following thought occurred to me last night but I didn't get a chance to test it until this morning.Isn't the modulo function designed to always return a positive integer?Hence, in this part of your data generator, I don't believe you need to use ABS:[code="sql"]SomeRandomInteger =  CHECKSUM(NEWID()) % @Range + @StartValue[/code]I didn't try it for a million rows but the first 200 came up all as positives.[/quote]It's pretty easy to show that the modulus operator will return a negative number:[code="sql"]select x = -1111%100[/code]Results:[code="plain"]x           -----------         -11 [/code][/quote]Gosh.  I'm not sure how I missed Dwain's question.  Thanks for the cover, Michael.</description><pubDate>Tue, 27 Mar 2012 19:04:54 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]SQLRNNR (3/27/2012)[/b][hr]Thanks for the article Jeff.[/quote]Thanks for the feedback, Jason.  Always good to hear from you.</description><pubDate>Tue, 27 Mar 2012 19:03:38 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Kangana Beri (3/27/2012)[/b][hr]Great article! Thanks.[/quote]Thanks for the read and the feedback, Kangana.  I appreciate it.</description><pubDate>Tue, 27 Mar 2012 19:02:31 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Thanks for the article Jeff.</description><pubDate>Tue, 27 Mar 2012 16:27:53 GMT</pubDate><dc:creator>SQLRNNR</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Great article! Thanks.</description><pubDate>Tue, 27 Mar 2012 16:15:41 GMT</pubDate><dc:creator>Kangana Beri</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Sorry.  Thought it was you.  Must have been Peter Larsson. Yes, we tested it and the conversion does make quite a bit of differrence.</description><pubDate>Tue, 27 Mar 2012 15:19:19 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Jeff Moden (3/26/2012)[/b][hr][quote][b]Michael Valentine Jones (3/26/2012)[/b][hr]Here is an alternate method that I use to generate the pseudo random numbers.  The basic method is to take the right 7 bytes from the NEWID function and convert that to a BIGINT before applying the MODULUS operator.  No need for the ABS function, since 7 bytes can only produce a positive BIGINT number.[code="sql"]if object_id('tempdb..#t','U') is not null begin drop table #t end-- Generate 20,000,000 rowsselect top 20000000	NUMBER = identity(int,1,1)into	#tfrom	(select top 4473 * from master.dbo.syscolumns) a	cross join	(select top 4473 * from master.dbo.syscolumns) b-- Show distribution of rowcount around average of 40000select	a.RandomNo,	Rows = count(*)from	(	select		RandomNo =		(convert(bigint,convert(varbinary(7),newid()))%500)+1	from		#t aa	) agroup by	a.RandomNoorder by	count(*),	a.RandomNo[/code][code="plain"]RandomNo             Rows        -------------------- -----------                  335       39455                    3       39457                   76       39481                  426       39489                  494       39535                  242       39539                  278       39539                  490       39548                  445       39553                  244       39566 .........                 124       40400                  228       40402                  425       40410                  286       40434                   45       40458                  463       40463                  373       40531                  152       40586 (500 row(s) affected)[/code][/quote]Like I said in the article, the conversion to VARBINARY will slow things down and to no good end if you don't really need BIGINT for the random integer.  If you really want BIGINT capability (and I realize that wasn't one of your goals in your example), I believe you'd also have to convert the whole NEWID() to VARBINARY.I also thought you were involved in some testing that showed the use of the square root of the final number of desired rows as a TOP for the self joined table in the Cross Join really wasn't worth it.The main point that I'm trying to make is that if it's too complicated, folks won't use it.[/quote]I never was involved in any testing with cross joins using the square root of number of desired rows.  I just grabbed a piece of code I had laying to be able to generate a test table, and I wasn't trying to say that was the way you should do it.Have you actually tested to confirm that the conversion to VARBINARY and then to BIGINT is slower than CHECKSUM and ABS?  I haven't but I would be surprised if there is much difference.I was just showing an alternate method to get a random number.  I think it is useful in cases where the range of random numbers is greater than an INT.Of course, if you want a random full BIGINT, you would do something like this, instead of stopping at 7 bytes; I just stopped at 7 bytes to eliminate the ABS function.[code="sql"]select convert(bigint,convert(varbinary(8),newid()))[/code]</description><pubDate>Tue, 27 Mar 2012 14:02:59 GMT</pubDate><dc:creator>Michael Valentine Jones</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]dwain.c (3/26/2012)[/b][hr]Jeff,The following thought occurred to me last night but I didn't get a chance to test it until this morning.Isn't the modulo function designed to always return a positive integer?Hence, in this part of your data generator, I don't believe you need to use ABS:[code="sql"]SomeRandomInteger =  CHECKSUM(NEWID()) % @Range + @StartValue[/code]I didn't try it for a million rows but the first 200 came up all as positives.[/quote]It's pretty easy to show that the modulus operator will return a negative number:[code="sql"]select x = -1111%100[/code]Results:[code="plain"]x           -----------         -11 [/code]</description><pubDate>Tue, 27 Mar 2012 13:46:01 GMT</pubDate><dc:creator>Michael Valentine Jones</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]derekr 43208 (3/27/2012)[/b][hr]Hi GuysBy declaring the variables as below - Will that guarantee that every number generated is Unique?[code="sql"]SELECT @NumberOfRows = 10000000,        @StartValue   = 1,        @EndValue     = 10000000,        @Range        = @EndValue - @StartValue + 1[/code]Thanks[/quote]Yes for the rownumber generator, No for the random number generator.If you want unique but randomly sorted numbers, it will involve a sort (which will be time consuming for as many rows as you've identified) using NEWID() to sort on.[code="sql"]WITH cteRowNumberGenerator AS(--==== Prevents sorting all the rows of the cross join. SELECT TOP (1000) --Put your desired number here        UniqueNumber = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))   FROM sys.all_columns ac1  CROSS JOIN sys.all_columns ac2)--==== Randomly sorts just the numbers generated from above. SELECT UniqueNumber   FROM cteRowNumberGenerator  ORDER BY NEWID();[/code]</description><pubDate>Tue, 27 Mar 2012 06:41:28 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Hi GuysBy declaring the variables as below - Will that guarantee that every number generated is Unique?[code="sql"]SELECT @NumberOfRows = 10000000,        @StartValue   = 1,        @EndValue     = 10000000,        @Range        = @EndValue - @StartValue + 1[/code]Thanks</description><pubDate>Tue, 27 Mar 2012 03:51:43 GMT</pubDate><dc:creator>SQLSACT</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Jeff,The following thought occurred to me last night but I didn't get a chance to test it until this morning.Isn't the modulo function designed to always return a positive integer?Hence, in this part of your data generator, I don't believe you need to use ABS:[code="sql"]SomeRandomInteger =  CHECKSUM(NEWID()) % @Range + @StartValue[/code]I didn't try it for a million rows but the first 200 came up all as positives.</description><pubDate>Mon, 26 Mar 2012 19:39:43 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>GPO said:[quote]It would be great to have a method of generating random data that approximated a distribution, whether it be Gaussian or exponential decay, or an F distribution or whatever.[/quote]The approach requires multiplying the numbers in the uniform distribution by the inverse of the new distribution's probability function.  This is not for the faint of heart.  I've done it before (not in SQL) for a Weibull distribution.This article shows how it can be done for a Gaussian distribution:[url]http://murison.alpheratz.net/Maple/GaussianDistribution/GaussianDistribution.pdf[/url]</description><pubDate>Mon, 26 Mar 2012 18:43:21 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]WayneS (3/26/2012)[/b][hr]Excellent article Jeff.Nice coincedence today... I went to the site to find how you did this, and here's the article explaining it all.Thanks for taking the time for this really great article that explains the how and why.[/quote]I know I said it before but thank you for the time you spent helping with the review.</description><pubDate>Mon, 26 Mar 2012 16:54:55 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]GPO (3/26/2012)[/b][hr][quote]That's a good point to bring up. A random distribution will create a uniform distribution across a range of data[/quote]I heartily agree. It's been a long time since I studied statistical distributions but a basic understanding of them is hugely useful. It would be great to have a method of generating random data that approximated a distribution, whether it be Gaussian or exponential decay, or an F distribution or whatever. A common phenomenon is where a column might validly accept one of say 30 integers. The vast majority of the time people will record one of five values and the frequency of recording of the others tapers off, with a few being used exceeding rarely. If you were testing things like index cardinality and column statistics generation, I wonder whether you'd get more representative testing results if your test data could mimic the distribution of what you expected to occur in production.[/quote]Hmmmm... maybe there needs to be a Part 4 to this series.</description><pubDate>Mon, 26 Mar 2012 16:51:00 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Matt Miller (#4) (3/26/2012)[/b][hr][quote][b]Jeff Moden (3/26/2012)[/b][hr][quote][b]Matt Miller (#4) (3/26/2012)[/b][hr][quote][b]sknox (3/26/2012)[/b][hr]For testing purposes (both scientific and software) pseudo-random numbers are preferable to truly random numbers*, because you want to see how the system responds to the entire range of possible inputs. A truly random number source cannot be trusted to give you a representative sample.* This is, of course, assuming that the pseudo-random number generator produces uniformly-distributed data. More on that in a bit.[/quote]That's a good point to bring up.  A random distribution will create a [u]uniform[/u] distribution across a range of data, but cannot on its own replicate any non-uniform data patterns.  So if you're looking to find out if there's a normal distribution in your data (or any number of other patterns across the set), using random data may not be a good option.This would be one of those big caveats in the "why would you need random data".  The random set will allow you to test for behavior of a varity of inputs at the detail level, but won't help with test the set as a whole.[/quote]Hmmmm... the constraints on range and domain aren't enough to satisfy this problem?  Such constraints could actually form a "bell curve" (or whatever) using a CASE statement to "weight" the outcome of the constrained random generator.[/quote]That's kind of what I meant by the "on its own" comment.   You can use the random data generator to pull in representative data in all allowed ranges, but you would need to play with the frequency or weight based on how far away from the mean you happen to be.  Assuming you have some knowledge of your data, you can shape your test data to match, using the random set as a base.[/quote]Ah... understood.  Thanks, Matt.</description><pubDate>Mon, 26 Mar 2012 16:48:16 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Michael Valentine Jones (3/26/2012)[/b][hr]Here is an alternate method that I use to generate the pseudo random numbers.  The basic method is to take the right 7 bytes from the NEWID function and convert that to a BIGINT before applying the MODULUS operator.  No need for the ABS function, since 7 bytes can only produce a positive BIGINT number.[code="sql"]if object_id('tempdb..#t','U') is not null begin drop table #t end-- Generate 20,000,000 rowsselect top 20000000	NUMBER = identity(int,1,1)into	#tfrom	(select top 4473 * from master.dbo.syscolumns) a	cross join	(select top 4473 * from master.dbo.syscolumns) b-- Show distribution of rowcount around average of 40000select	a.RandomNo,	Rows = count(*)from	(	select		RandomNo =		(convert(bigint,convert(varbinary(7),newid()))%500)+1	from		#t aa	) agroup by	a.RandomNoorder by	count(*),	a.RandomNo[/code][code="plain"]RandomNo             Rows        -------------------- -----------                  335       39455                    3       39457                   76       39481                  426       39489                  494       39535                  242       39539                  278       39539                  490       39548                  445       39553                  244       39566 .........                 124       40400                  228       40402                  425       40410                  286       40434                   45       40458                  463       40463                  373       40531                  152       40586 (500 row(s) affected)[/code][/quote]Like I said in the article, the conversion to VARBINARY will slow things down and to no good end if you don't really need BIGINT for the random integer.  If you really want BIGINT capability (and I realize that wasn't one of your goals in your example), I believe you'd also have to convert the whole NEWID() to VARBINARY.I also thought you were involved in some testing that showed the use of the square root of the final number of desired rows as a TOP for the self joined table in the Cross Join really wasn't worth it.The main point that I'm trying to make is that if it's too complicated, folks won't use it.</description><pubDate>Mon, 26 Mar 2012 16:46:04 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote]That's a good point to bring up. A random distribution will create a uniform distribution across a range of data[/quote]I heartily agree. It's been a long time since I studied statistical distributions but a basic understanding of them is hugely useful. It would be great to have a method of generating random data that approximated a distribution, whether it be Gaussian or exponential decay, or an F distribution or whatever. A common phenomenon is where a column might validly accept one of say 30 integers. The vast majority of the time people will record one of five values and the frequency of recording of the others tapers off, with a few being used exceeding rarely. If you were testing things like index cardinality and column statistics generation, I wonder whether you'd get more representative testing results if your test data could mimic the distribution of what you expected to occur in production.</description><pubDate>Mon, 26 Mar 2012 15:29:31 GMT</pubDate><dc:creator>GPO</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Excellent article Jeff.Nice coincedence today... I went to the site to find how you did this, and here's the article explaining it all.Thanks for taking the time for this really great article that explains the how and why.</description><pubDate>Mon, 26 Mar 2012 15:07:20 GMT</pubDate><dc:creator>WayneS</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Jeff Moden (3/26/2012)[/b][hr][quote][b]Matt Miller (#4) (3/26/2012)[/b][hr][quote][b]sknox (3/26/2012)[/b][hr]For testing purposes (both scientific and software) pseudo-random numbers are preferable to truly random numbers*, because you want to see how the system responds to the entire range of possible inputs. A truly random number source cannot be trusted to give you a representative sample.* This is, of course, assuming that the pseudo-random number generator produces uniformly-distributed data. More on that in a bit.[/quote]That's a good point to bring up.  A random distribution will create a [u]uniform[/u] distribution across a range of data, but cannot on its own replicate any non-uniform data patterns.  So if you're looking to find out if there's a normal distribution in your data (or any number of other patterns across the set), using random data may not be a good option.This would be one of those big caveats in the "why would you need random data".  The random set will allow you to test for behavior of a varity of inputs at the detail level, but won't help with test the set as a whole.[/quote]Hmmmm... the constraints on range and domain aren't enough to satisfy this problem?  Such constraints could actually form a "bell curve" (or whatever) using a CASE statement to "weight" the outcome of the constrained random generator.[/quote]That's kind of what I meant by the "on its own" comment.   You can use the random data generator to pull in representative data in all allowed ranges, but you would need to play with the frequency or weight based on how far away from the mean you happen to be.  Assuming you have some knowledge of your data, you can shape your test data to match, using the random set as a base.</description><pubDate>Mon, 26 Mar 2012 15:05:54 GMT</pubDate><dc:creator>Matt Miller (#4)</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>Here is an alternate method that I use to generate the pseudo random numbers.  The basic method is to take the right 7 bytes from the NEWID function and convert that to a BIGINT before applying the MODULUS operator.  No need for the ABS function, since 7 bytes can only produce a positive BIGINT number.[code="sql"]if object_id('tempdb..#t','U') is not null begin drop table #t end-- Generate 20,000,000 rowsselect top 20000000	NUMBER = identity(int,1,1)into	#tfrom	(select top 4473 * from master.dbo.syscolumns) a	cross join	(select top 4473 * from master.dbo.syscolumns) b-- Show distribution of rowcount around average of 40000select	a.RandomNo,	Rows = count(*)from	(	select		RandomNo =		(convert(bigint,convert(varbinary(7),newid()))%500)+1	from		#t aa	) agroup by	a.RandomNoorder by	count(*),	a.RandomNo[/code][code="plain"]RandomNo             Rows        -------------------- -----------                  335       39455                    3       39457                   76       39481                  426       39489                  494       39535                  242       39539                  278       39539                  490       39548                  445       39553                  244       39566 .........                 124       40400                  228       40402                  425       40410                  286       40434                   45       40458                  463       40463                  373       40531                  152       40586 (500 row(s) affected)[/code]</description><pubDate>Mon, 26 Mar 2012 14:22:44 GMT</pubDate><dc:creator>Michael Valentine Jones</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]Matt Miller (#4) (3/26/2012)[/b][hr][quote][b]sknox (3/26/2012)[/b][hr]For testing purposes (both scientific and software) pseudo-random numbers are preferable to truly random numbers*, because you want to see how the system responds to the entire range of possible inputs. A truly random number source cannot be trusted to give you a representative sample.* This is, of course, assuming that the pseudo-random number generator produces uniformly-distributed data. More on that in a bit.[/quote]That's a good point to bring up.  A random distribution will create a [u]uniform[/u] distribution across a range of data, but cannot on its own replicate any non-uniform data patterns.  So if you're looking to find out if there's a normal distribution in your data (or any number of other patterns across the set), using random data may not be a good option.This would be one of those big caveats in the "why would you need random data".  The random set will allow you to test for behavior of a varity of inputs at the detail level, but won't help with test the set as a whole.[/quote]Hmmmm... the constraints on range and domain aren't enough to satisfy this problem?  Such constraints could actually form a "bell curve" (or whatever) using a CASE statement to "weight" the outcome of the constrained random generator.</description><pubDate>Mon, 26 Mar 2012 13:57:11 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Generating Test Data: Part 1 - Generating Random Integers and Floats</title><link>http://www.sqlservercentral.com/Forums/Topic1272409-203-1.aspx</link><description>[quote][b]sknox (3/26/2012)[/b][hr]For testing purposes (both scientific and software) pseudo-random numbers are preferable to truly random numbers*, because you want to see how the system responds to the entire range of possible inputs. A truly random number source cannot be trusted to give you a representative sample.* This is, of course, assuming that the pseudo-random number generator produces uniformly-distributed data. More on that in a bit.[/quote]That's a good point to bring up.  A random distribution will create a [u]uniform[/u] distribution across a range of data, but cannot on its own replicate any non-uniform data patterns.  So if you're looking to find out if there's a normal distribution in your data (or any number of other patterns across the set), using random data may not be a good option.This would be one of those big caveats in the "why would you need random data".  The random set will allow you to test for behavior of a varity of inputs at the detail level, but won't help with test the set as a whole.</description><pubDate>Mon, 26 Mar 2012 13:51:09 GMT</pubDate><dc:creator>Matt Miller (#4)</dc:creator></item></channel></rss>