SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Generating Test Data: Part 1 - Generating Random Integers and Floats


Generating Test Data: Part 1 - Generating Random Integers and Floats

Author
Message
dwain.c
dwain.c
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17995 Visits: 6431
Jeff,

The following thought occurred to me last night but I didn't get a chance to test it until this morning.

Isn't the modulo function designed to always return a positive integer?

Hence, in this part of your data generator, I don't believe you need to use ABS:

SomeRandomInteger =  CHECKSUM(NEWID()) % @Range + @StartValue



I didn't try it for a million rows but the first 200 came up all as positives.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
SQLSACT
SQLSACT
SSCertifiable
SSCertifiable (5K reputation)SSCertifiable (5K reputation)SSCertifiable (5K reputation)SSCertifiable (5K reputation)SSCertifiable (5K reputation)SSCertifiable (5K reputation)SSCertifiable (5K reputation)SSCertifiable (5K reputation)

Group: General Forum Members
Points: 5022 Visits: 2969
Hi Guys

By declaring the variables as below - Will that guarantee that every number generated is Unique?

SELECT @NumberOfRows = 10000000,
@StartValue = 1,
@EndValue = 10000000,
@Range = @EndValue - @StartValue + 1



Thanks
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)

Group: General Forum Members
Points: 216964 Visits: 41991
derekr 43208 (3/27/2012)
Hi Guys

By declaring the variables as below - Will that guarantee that every number generated is Unique?

SELECT @NumberOfRows = 10000000,
@StartValue = 1,
@EndValue = 10000000,
@Range = @EndValue - @StartValue + 1



Thanks


Yes for the rownumber generator, No for the random number generator.

If you want unique but randomly sorted numbers, it will involve a sort (which will be time consuming for as many rows as you've identified) using NEWID() to sort on.

WITH 
cteRowNumberGenerator AS
(--==== Prevents sorting all the rows of the cross join.
SELECT TOP (1000) --Put your desired number here
UniqueNumber = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
)--==== Randomly sorts just the numbers generated from above.
SELECT UniqueNumber
FROM cteRowNumberGenerator
ORDER BY NEWID()
;




--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Michael Valentine Jones
Michael Valentine Jones
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14643 Visits: 11848
dwain.c (3/26/2012)
Jeff,

The following thought occurred to me last night but I didn't get a chance to test it until this morning.

Isn't the modulo function designed to always return a positive integer?

Hence, in this part of your data generator, I don't believe you need to use ABS:

SomeRandomInteger =  CHECKSUM(NEWID()) % @Range + @StartValue



I didn't try it for a million rows but the first 200 came up all as positives.


It's pretty easy to show that the modulus operator will return a negative number:

select x = -1111%100


Results:
x           
-----------
-11

Michael Valentine Jones
Michael Valentine Jones
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14643 Visits: 11848
Jeff Moden (3/26/2012)
Michael Valentine Jones (3/26/2012)
Here is an alternate method that I use to generate the pseudo random numbers. The basic method is to take the right 7 bytes from the NEWID function and convert that to a BIGINT before applying the MODULUS operator. No need for the ABS function, since 7 bytes can only produce a positive BIGINT number.



if object_id('tempdb..#t','U') is not null begin drop table #t end

-- Generate 20,000,000 rows
select top 20000000
NUMBER = identity(int,1,1)
into
#t
from
(select top 4473 * from master.dbo.syscolumns) a
cross join
(select top 4473 * from master.dbo.syscolumns) b

-- Show distribution of rowcount around average of 40000
select
a.RandomNo,
Rows = count(*)
from
(
select
RandomNo =
(convert(bigint,convert(varbinary(7),newid()))%500)+1
from
#t aa
) a
group by
a.RandomNo
order by
count(*),
a.RandomNo






RandomNo Rows
-------------------- -----------
335 39455
3 39457
76 39481
426 39489
494 39535
242 39539
278 39539
490 39548
445 39553
244 39566
...
...
...
124 40400
228 40402
425 40410
286 40434
45 40458
463 40463
373 40531
152 40586

(500 row(s) affected)







Like I said in the article, the conversion to VARBINARY will slow things down and to no good end if you don't really need BIGINT for the random integer. If you really want BIGINT capability (and I realize that wasn't one of your goals in your example), I believe you'd also have to convert the whole NEWID() to VARBINARY.

I also thought you were involved in some testing that showed the use of the square root of the final number of desired rows as a TOP for the self joined table in the Cross Join really wasn't worth it.

The main point that I'm trying to make is that if it's too complicated, folks won't use it.



I never was involved in any testing with cross joins using the square root of number of desired rows. I just grabbed a piece of code I had laying to be able to generate a test table, and I wasn't trying to say that was the way you should do it.

Have you actually tested to confirm that the conversion to VARBINARY and then to BIGINT is slower than CHECKSUM and ABS? I haven't but I would be surprised if there is much difference.

I was just showing an alternate method to get a random number. I think it is useful in cases where the range of random numbers is greater than an INT.

Of course, if you want a random full BIGINT, you would do something like this, instead of stopping at 7 bytes; I just stopped at 7 bytes to eliminate the ABS function.
select convert(bigint,convert(varbinary(8),newid()))


Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)

Group: General Forum Members
Points: 216964 Visits: 41991
Sorry. Thought it was you. Must have been Peter Larsson.

Yes, we tested it and the conversion does make quite a bit of differrence.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Kangana Beri
Kangana Beri
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1106 Visits: 954
Great article! Thanks.
SQLRNNR
SQLRNNR
SSC Guru
SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)SSC Guru (67K reputation)

Group: General Forum Members
Points: 67195 Visits: 18570
Thanks for the article Jeff.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw

Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)

Group: General Forum Members
Points: 216964 Visits: 41991
Kangana Beri (3/27/2012)
Great article! Thanks.


Thanks for the read and the feedback, Kangana. I appreciate it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)

Group: General Forum Members
Points: 216964 Visits: 41991
SQLRNNR (3/27/2012)
Thanks for the article Jeff.


Thanks for the feedback, Jason. Always good to hear from you.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search