Generating Test Data: Part 1 - Generating Random Integers and Floats

Question

Post reply

Generating Test Data: Part 1 - Generating Random Integers and Floats

Jeff Moden

SSC Guru

Points: 1003865
More actions
March 25, 2012 at 5:26 pm

#255191

Comments posted to this topic are about the item Generating Test Data: Part 1 - Generating Random Integers and Floats
--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
Change is inevitable... Change for the better is not.
Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 60 total)

You must be logged in to reply to this topic. Login to reply

Dwain Camps SSC Guru Points: 86908 More actions · Answer 1

Outstanding article Jeff! Just what the doctor ordered for something I'm working on at this instant.

I can't wait for the purists to berate you for using "pseudo" random numbers though. :w00t:

And let me guess:

DECLARE @Range INT

,@StartValueDATETIME

,@EndValueDATETIME

SELECT @StartValue = '2012-02-15', @EndValue = '2012-12-31'

SELECT @Range = DATEDIFF(day, @StartValue, @EndValue)

SELECT TOP 20-- Random dates

DATEADD(day, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomTime

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

SELECT @Range = DATEDIFF(second, @StartValue, @EndValue)

SELECT TOP 20-- Random times (to the second)

DATEADD(second, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomDate

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

🙂

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Krtyknm SSC Eights! Points: 884 More actions · Answer 2

Hi Jeff,

Nice Article, Keep going.

I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

Thanks,

Karthik

paul.knibbs SSCoach Points: 15320 More actions · Answer 3

Krtyknm (3/26/2012)
I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.

He's only using a system table (or view, strictly speaking) because of the number of rows it has--any table with a decent number of rows will work, whether it's a tally table or one of your main data tables.

GPO SSCarpal Tunnel Points: 4574 More actions · Answer 4

This article was called "Generating Test Data: Part 1 - Generating Random Integers and Floats" It should have been called "Generating Test Data: Part 1 - A Way Cool Compendium of Well Explained El Neato T-SQL Techniques"

Just one minor gripe - not about the article - about us, the SQL Server community. I know times are tough but we should all be ashamed of ourselves for allowing a true luminary of the T-SQL world to slip into such penury that he is only able to afford the sort of machine that my kids wouldn't (well probably couldn't) watch YouTube on. What say we pass the hat around and get a natty 2006 machine off ebay for Jeff? Maybe we could start up a One Laptop Per MVP project.

...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

Bryant McClellan SSCarpal Tunnel Points: 4337 More actions · Answer 5

Good idea. If you count all the posters except Jeff you are up to 10 cents:-D

------------
Buy the ticket, take the ride. -- Hunter S. Thompson

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 6

Krtyknm (3/26/2012)
Hi Jeff,
Nice Article, Keep going.
I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.
Thanks,
Karthik

They can use virtually any table. Any table with just 1000 rows will allow them to build a million row table using the Cross Join.

That, notwithstanding, in such situations and like I said in the article, I'll build a Tally table for them. My Tally tables usually have 11000 rows in them so I can use it to build 30 years of dates. That's enough for a Cross Join to build a million rows... or 121 million if they want. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 7

paul.knibbs (3/26/2012)
Krtyknm (3/26/2012)
I have a question on sys tables, most of them using the Sys.tables for generating random numbers. Assume that developers don't have an access to System tables, then how can they get the random numbers.
He's only using a system table (or view, strictly speaking) because of the number of rows it has--any table with a decent number of rows will work, whether it's a tally table or one of your main data tables.

Ah... I should have scrolled down a bit more before replying. Thanks for the cover, Paul!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 8

GPO (3/26/2012)
This article was called "Generating Test Data: Part 1 - Generating Random Integers and Floats" It should have been called "Generating Test Data: Part 1 - A Way Cool Compendium of Well Explained El Neato T-SQL Techniques"
Just one minor gripe - not about the article - about us, the SQL Server community. I know times are tough but we should all be ashamed of ourselves for allowing a true luminary of the T-SQL world to slip into such penury that he is only able to afford the sort of machine that my kids wouldn't (well probably couldn't) watch YouTube on. What say we pass the hat around and get a natty 2006 machine off ebay for Jeff? Maybe we could start up a One Laptop Per MVP project.

That's a much cooler title! Thanks for the great feedback.

I actually do have a more modern HP G71 laptop that comparatively screams and will do parallelism, etc but I actually like my ol' war horse. If I can make something run fast on it, I know you folks with "real" machines are going to love it. 🙂

That, notwithstanding, maybe we could get Al Gore to start a "No MVP left behind" project. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 9

G Bryant McClellan (3/26/2012)
Good idea. If you count all the posters except Jeff you are up to 10 cents:-D

Heh... that's more than Al Gore has given me so far. :hehe:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Andre Guerreiro SSCertifiable Points: 7319 More actions · Answer 10

Another great article! And it explains the reasons to use that technique going back to basics.

Sometimes we know that we should do things one way or another because it's recommended everywhere but sometimes we don't exactly know "why" it's better. Articles that explain "why" are needed. Thank you.

Best regards,

Andre Guerreiro Neto

Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2/MCSE/MCSA

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 11

dwain.c (3/26/2012)
Outstanding article Jeff! Just what the doctor ordered for something I'm working on at this instant.
I can't wait for the purists to berate you for using "pseudo" random numbers though. :w00t:
And let me guess:
DECLARE @Range INT
,@StartValueDATETIME
,@EndValueDATETIME
SELECT @StartValue = '2012-02-15', @EndValue = '2012-12-31'
SELECT @Range = DATEDIFF(day, @StartValue, @EndValue)
SELECT TOP 20-- Random dates
DATEADD(day, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomTime
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
SELECT @Range = DATEDIFF(second, @StartValue, @EndValue)
SELECT TOP 20-- Random times (to the second)
DATEADD(second, ABS(CHECKSUM(NEWID()) % @Range), @StartValue) As SomeRandomDate
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
🙂

I might be safe for the next 10 minutes or so. Although the "next" random value is certainly predictable, you'd have to know a fair bit about how NEWID() is generated to predict the next value 😀

You're just about spot on in your code. Just substitute @Range for the 20 in TOP 20 and Bob's your non-hardcoded Uncle. 🙂 In Part 2, I'll explain how to easily include random times as a part of generating random dates. I'll also cover making period bins.

Thanks for the feedback, Dwain!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 12

codebyo (3/26/2012)
Another great article! And it explains the reasons to use that technique going back to basics.
Sometimes we know that we should do things one way or another because it's recommended everywhere but sometimes we don't exactly know "why" it's better. Articles that explain "why" are needed. Thank you.

You've hit the nail on the head, Andre. Random data generation is just like the Tally Table used to be. A lot of people were using it thanks to some posted code examples but they may not have known the "Why" of how it all worked. I wanted to make sure that people knew "Why" things worked the way they did so they can really think outside the box when the time comes.

Of course, the other reason I'm writing this is so that people who might not have otherwise been able to do so, can easily write some test data to see if that "works for 10 rows" code example they find on the internet is worth its salt. :hehe:

Thanks for the feedback and for "getting it".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sean Lange SSC Guru Points: 286573 More actions · Answer 13

Excellent article Jeff. Yet another home run!!! Your explanations are always so clear that even us lessers can understand. Keep them coming!!!

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 14

Sean Lange (3/26/2012)
Excellent article Jeff. Yet another home run!!! Your explanations are always so clear that even us lessers can understand. Keep them coming!!!

Heh... "lessers". I've seen your posts and I wouldn't associate the word "lesser" with someone like you. As you said, "Keep them coming!!!"

Thanks for the feedback, Sean.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)