Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Generating Test Data: Part 1 - Generating Random Integers and Floats


Generating Test Data: Part 1 - Generating Random Integers and Floats

Author
Message
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44996 Visits: 39880
mishaluba (4/21/2012)
Another brilliant article from Jeff. Easy and fun read (not very often you can say this about a technical text). Can't wait for Part 2.


Thanks, Mishaluba. I had gotten into a bit of a rut on this one and the folks I asked to review it for me (I knew I was in a rut) gave me some excellent suggestions to keep it from being monotonous. It's a tough thing to write something so technical and not be totally boring.

Thank you for the nice feedback on the article.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Dev
Dev
SSCommitted
SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)

Group: General Forum Members
Points: 1944 Visits: 1598
I was almost disconnected with SSC for couple of months and got a chance to read the article today. Nice work!!!

I remember when I read one of your posts (1 million rows test) in a discussion; I found it difficult to understand how these 4 lines of code actually work. I followed classic divide & conquer technique to understand it (abs | checksum | newid | top | cross join). I am sure many of the readers might have lost their interest there as long as it served their purpose. This explanation will keep them connected to your articles.

Thanks for the article!!!

Dev

*** Open Network for Database Professionals ***
http://www.linkedin.com/in/devendrashirbad

Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44996 Visits: 39880
Well done and thanks for the feedback, Dev.

Yeah... this article will give me a place to point to if someone comes up with the question on one of my posts. Hopefully, that's not all it's good for. Hopefully more people will do their own testing before they say such things as "In my experience, the best method is..." or "I've been at this for 25 years and the fastest method is..." or things like "Recursion rocks for everything!" because I've found that they're frequently wrong. Help destroy SQL Myths! Performance test some code today!!! ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
dwain.c
dwain.c
SSCarpal Tunnel
SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)

Group: General Forum Members
Points: 4249 Visits: 6431
Hey Jeff!

Using your test harness yesterday, I ran into some unexpected (for me) results. Let me explain:


--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
@StartValue INT,
@EndValue INT,
@Range INT
;
--===== Preset the variables to known values
SELECT @NumberOfRows = 1000000,
@StartValue = 400,
@EndValue = 500,
@Range = @EndValue - @StartValue + 1
;
--===== Conditionally drop the test table to make reruns easier in SSMS
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable
;
--===== Create the test table with "random constrained" integers and floats
-- within the parameters identified in the variables above.
SELECT TOP (@NumberOfRows)
SomeRandomInteger = ABS(CHECKSUM(NEWID())) % @Range + @StartValue,
SomeRandomFloat = RAND(CHECKSUM(NEWID())) * @Range + @StartValue
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2

SELECT MinFloat=MIN(SomeRandomFloat), MaxFloat=MAX(SomeRandomFloat)
FROM #SomeTestTable

DROP TABLE #SomeTestTable




This returns the following:


MinFloat MaxFloat
400.000012322329 500.999933381006


My surprise was due to the fact that RAND() returns a random floating point number on the closed interval {0,1}, whereas this approach yielded some random floats outside the interval {400,500}. Clearly this approach is suitable when you are interested in applying the FLOAT to a date range, such as the following:


--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
@StartValue INT,
@EndValue INT,
@Range INT,
@FStartValue INT,
@FEndValue INT,
@FRange INT
;
--===== Preset the variables to known values
SELECT @NumberOfRows = 1000000,
@StartValue = 400,
@EndValue = 500,
@Range = @EndValue - @StartValue + 1,
@FStartValue = 400,
@FEndValue = 500,
@FRange = @EndValue - @StartValue
;
--===== Conditionally drop the test table to make reruns easier in SSMS
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable
;
--===== Create the test table with "random constrained" integers and floats
-- within the parameters identified in the variables above.
SELECT TOP (@NumberOfRows)
SomeRandomInteger = ABS(CHECKSUM(NEWID())) % @Range + @StartValue,
SomeRandomFloat = RAND(CHECKSUM(NEWID())) * @FRange + @FStartValue,
SomeRandomDate = RAND(CHECKSUM(NEWID())) * @Range +
DATEADD(day, -@StartValue, DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0))
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2

SELECT MinFloat=MIN(SomeRandomFloat), MaxFloat=MAX(SomeRandomFloat)
,MinDate=MIN(SomeRandomDate), MaxDate=MAX(SomeRandomDate)
FROM #SomeTestTable

DROP TABLE #SomeTestTable




Which delivers these results:


MinFloat MaxFloat MinDate MaxDate
400.000069523232 499.999990152787 2011-05-04 00:00:16.187 2011-08-12 23:59:58.753



I have taken the liberty to change the formula for SomeRandomFloat to deliver results on the closed interval {400,500}, which is what I was expecting. Date results end up being in the range of GETDATE() - 400 days and spans 100 days.

Mind you, I'm not reporting a bug. I'd prefer to think of it as an unexplained feature. :-)

Let me know if I've misinterpreted something.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
paul.knibbs
paul.knibbs
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1972 Visits: 6213
dwain, Jeff said this was the expected behaviour in the article:

The following code generates a million rows of random integers (the SomeRandomInteger column) with possible values from 400 to 500 and float values (the SomeRandomFloat column) from 400 to 500.999999999999 in just over 4 seconds on my 10 year old, single CPU computer.
dwain.c
dwain.c
SSCarpal Tunnel
SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)

Group: General Forum Members
Points: 4249 Visits: 6431
paul.knibbs (6/7/2012)
dwain, Jeff said this was the expected behaviour in the article:

The following code generates a million rows of random integers (the SomeRandomInteger column) with possible values from 400 to 500 and float values (the SomeRandomFloat column) from 400 to 500.999999999999 in just over 4 seconds on my 10 year old, single CPU computer.


Ooops. Must'a missed it. Anyway it is clear now. Thanks.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44996 Visits: 39880
And, just to be sure... RAND does NOT return the closed interval of {0,1}. It's a closed-open where 0 materializes but 1 never does. As Dwain has shown there are a couple of ways to handle it depending on what you want to do. The key in the article was to make every keenly aware that while constraining such numbers by range and domain is quite similar to the integer method, you have to account for the whole number-line between the integers instead of just the integers.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
dwain.c
dwain.c
SSCarpal Tunnel
SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)

Group: General Forum Members
Points: 4249 Visits: 6431
Jeff Moden (6/7/2012)
And, just to be sure... RAND does NOT return the closed interval of {0,1}. It's a closed-open where 0 materializes but 1 never does. As Dwain has shown there are a couple of ways to handle it depending on what you want to do. The key in the article was to make every keenly aware that while constraining such numbers by range and domain is quite similar to the integer method, you have to account for the whole number-line between the integers instead of just the integers.


Guess I need to spend more time with my nose firmly planted in BOL.

Thanks Jeff, yet again, for teaching me something!


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44996 Visits: 39880
My pleasure and thank you for the kind feedback, Dwaine.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Steen Dybboe
Steen Dybboe
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 191
Meant to rate it 5 Stars. Pushed one by accident - sorry

Great article Jeff

/Steen
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search