Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Generating Test Data: Part 1 - Generating Random Integers and Floats


Generating Test Data: Part 1 - Generating Random Integers and Floats

Author
Message
Andre Guerreiro
Andre Guerreiro
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1065 Visits: 1514
Another great article! And it explains the reasons to use that technique going back to basics.
Sometimes we know that we should do things one way or another because it's recommended everywhere but sometimes we don't exactly know "why" it's better. Articles that explain "why" are needed. Thank you.

Best regards,

Andre Guerreiro Neto

Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2/MCSE/MCSA
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45020 Visits: 39887
dwain.c (3/26/2012)
Outstanding article Jeff! Just what the doctor ordered for something I'm working on at this instant.

I can't wait for the purists to berate you for using "pseudo" random numbers though. w00t

And let me guess:
DECLARE @Range INT
,@StartValue DATETIME
,@EndValue DATETIME

SELECT @StartValue = '2012-02-15', @EndValue = '2012-12-31'

SELECT @Range = DATEDIFF(day, @StartValue, @EndValue)

SELECT TOP 20 -- Random dates
DATEADD(day, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomTime
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2

SELECT @Range = DATEDIFF(second, @StartValue, @EndValue)

SELECT TOP 20 -- Random times (to the second)
DATEADD(second, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomDate
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2


:-)


I might be safe for the next 10 minutes or so. Although the "next" random value is certainly predictable, you'd have to know a fair bit about how NEWID() is generated to predict the next value :-D

You're just about spot on in your code. Just substitute @Range for the 20 in TOP 20 and Bob's your non-hardcoded Uncle. :-) In Part 2, I'll explain how to easily include random times as a part of generating random dates. I'll also cover making period bins.

Thanks for the feedback, Dwain!

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45020 Visits: 39887
codebyo (3/26/2012)
Another great article! And it explains the reasons to use that technique going back to basics.
Sometimes we know that we should do things one way or another because it's recommended everywhere but sometimes we don't exactly know "why" it's better. Articles that explain "why" are needed. Thank you.



You've hit the nail on the head, Andre. Random data generation is just like the Tally Table used to be. A lot of people were using it thanks to some posted code examples but they may not have known the "Why" of how it all worked. I wanted to make sure that people knew "Why" things worked the way they did so they can really think outside the box when the time comes.

Of course, the other reason I'm writing this is so that people who might not have otherwise been able to do so, can easily write some test data to see if that "works for 10 rows" code example they find on the internet is worth its salt. Hehe

Thanks for the feedback and for "getting it".

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Sean Lange
Sean Lange
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16550 Visits: 17004
Excellent article Jeff. Yet another home run!!! Your explanations are always so clear that even us lessers can understand. Keep them coming!!!

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45020 Visits: 39887
Sean Lange (3/26/2012)
Excellent article Jeff. Yet another home run!!! Your explanations are always so clear that even us lessers can understand. Keep them coming!!!


Heh... "lessers". I've seen your posts and I wouldn't associate the word "lesser" with someone like you. As you said, "Keep them coming!!!"

Thanks for the feedback, Sean.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
ALZDBA
ALZDBA
SSCertifiable
SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)

Group: General Forum Members
Points: 6972 Visits: 8839
Great extrapolation of the KISS principle, Jeff.

Need it to be sead ... I LOVE IT w00t

Johan


Don't drive faster than your guardian angel can fly ...
but keeping both feet on the ground won't get you anywhere w00t

- How to post Performance Problems
- How to post data/code to get the best help


- How to prevent a sore throat after hours of presenting ppt ?


"press F1 for solution", "press shift+F1 for urgent solution" :-D


Need a bit of Powershell? How about this

Who am I ? Sometimes this is me Alien but most of the time this is me Hehe
Scott Abrants
Scott Abrants
Valued Member
Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)

Group: General Forum Members
Points: 67 Visits: 405
Excellent post! Great examples, great code, and easy to follow!
Nice job Jeff!
Lynn Pettis
Lynn Pettis
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24177 Visits: 37948
Krtyknm (3/26/2012)
Hi Jeff,

Nice Article, Keep going.

I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

Thanks,
Karthik


In addition to what has already been said about using other tables, or creating a tally table, it is also possible to create a dynamic tally table using CTEs in SQL Server 2005 and newer. You can find numerous examples of these in the forums and articles on SSC.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
sknox
sknox
SSCrazy
SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)

Group: General Forum Members
Points: 2030 Visits: 2709
Jeff Moden (3/26/2012)
dwain.c (3/26/2012)
Outstanding article Jeff! Just what the doctor ordered for something I'm working on at this instant.

I can't wait for the purists to berate you for using "pseudo" random numbers though. w00t


I might be safe for the next 10 minutes or so. Although the "next" random value is certainly predictable, you'd have to know a fair bit about how NEWID() is generated to predict the next value :-D


For testing purposes (both scientific and software) pseudo-random numbers are preferable to truly random numbers*, because you want to see how the system responds to the entire range of possible inputs. A truly random number source cannot be trusted to give you a representative sample.

* This is, of course, assuming that the pseudo-random number generator produces uniformly-distributed data. More on that in a bit.

Edit: more on that: --
So the question becomes: does abs(checksum(newid())) produce a relatively uniform distribution of values?
To test that, I created a dataset with the following code: (NOTE -- this generated 57 million rows on my test machine -- use with caution!)

select abs(checksum(newid())) as RandValue
into RandIDTesting
from sys.all_columns ac1
cross join sys.all_columns ac2



I then wrote the following code to see how the data is distributed:


declare @RangeMin int = 0
declare @RangeMax int = 2147483647
declare @RangeInc int = 65536
declare @RangeCount int = @RangeMax/@RangeInc
select @RangeMin, @RangeMax, @RangeInc, @RangeCount;

with Ranges as (
select top (@RangeCount+1)
@RangeMin + @RangeInc * (row_number() over (order by (select null))-1) as RangeStart,
@RangeMin + @RangeInc * (row_number() over (order by (select null)))-1 as RangeEnd
from sys.all_columns ac1
cross join sys.all_columns ac2
)
select RangeStart, RangeEnd, (select count(*) from RandIDTesting where RandValue between RangeStart and RangeEnd) as RangeSize
from Ranges
group by RangeStart, RangeEnd
order by RangeStart



This produced a list of ranges and how many of our pseudo-random numbers fell into that range. In my testing, all of the ranges had between roughly 1500 to roughly 1700 numbers in it.
So in this case, this method did produce a relatively uniform sample set. This is not conclusive, but you can methods similar to the above to test for yourself.
Matt Miller (#4)
Matt Miller (#4)
SSCertifiable
SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)SSCertifiable (7.6K reputation)

Group: General Forum Members
Points: 7641 Visits: 18069
sknox (3/26/2012)
For testing purposes (both scientific and software) pseudo-random numbers are preferable to truly random numbers*, because you want to see how the system responds to the entire range of possible inputs. A truly random number source cannot be trusted to give you a representative sample.

* This is, of course, assuming that the pseudo-random number generator produces uniformly-distributed data. More on that in a bit.


That's a good point to bring up. A random distribution will create a uniform distribution across a range of data, but cannot on its own replicate any non-uniform data patterns. So if you're looking to find out if there's a normal distribution in your data (or any number of other patterns across the set), using random data may not be a good option.

This would be one of those big caveats in the "why would you need random data". The random set will allow you to test for behavior of a varity of inputs at the detail level, but won't help with test the set as a whole.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search