RE: Generating Test Data: Part 1 - Generating Random Integers and Floats

Dwain Camps

SSC Guru

Points: 86908

June 6, 2012 at 6:49 pm

#1497970

Hey Jeff!

Using your test harness yesterday, I ran into some unexpected (for me) results. Let me explain:

--===== Declare some obviously named variables

DECLARE @NumberOfRows INT,

@StartValue INT,

@EndValue INT,

@Range INT

;

--===== Preset the variables to known values

SELECT @NumberOfRows = 1000000,

@StartValue = 400,

@EndValue = 500,

@Range = @EndValue - @StartValue + 1

;

--===== Conditionally drop the test table to make reruns easier in SSMS

IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

DROP TABLE #SomeTestTable

;

--===== Create the test table with "random constrained" integers and floats

-- within the parameters identified in the variables above.

SELECT TOP (@NumberOfRows)

SomeRandomInteger = ABS(CHECKSUM(NEWID())) % @Range + @StartValue,

SomeRandomFloat = RAND(CHECKSUM(NEWID())) * @Range + @StartValue

INTO #SomeTestTable

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

SELECT MinFloat=MIN(SomeRandomFloat), MaxFloat=MAX(SomeRandomFloat)

FROM #SomeTestTable

DROP TABLE #SomeTestTable

This returns the following:

MinFloat MaxFloat

400.000012322329 500.999933381006

My surprise was due to the fact that RAND() returns a random floating point number on the closed interval {0,1}, whereas this approach yielded some random floats outside the interval {400,500}. Clearly this approach is suitable when you are interested in applying the FLOAT to a date range, such as the following:

--===== Declare some obviously named variables

DECLARE @NumberOfRows INT,

@StartValue INT,

@EndValue INT,

@Range INT,

@FStartValue INT,

@FEndValue INT,

@FRange INT

;

--===== Preset the variables to known values

SELECT @NumberOfRows = 1000000,

@StartValue = 400,

@EndValue = 500,

@Range = @EndValue - @StartValue + 1,

@FStartValue = 400,

@FEndValue = 500,

@FRange = @EndValue - @StartValue

;

--===== Conditionally drop the test table to make reruns easier in SSMS

IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

DROP TABLE #SomeTestTable

;

--===== Create the test table with "random constrained" integers and floats

-- within the parameters identified in the variables above.

SELECT TOP (@NumberOfRows)

SomeRandomInteger = ABS(CHECKSUM(NEWID())) % @Range + @StartValue,

SomeRandomFloat = RAND(CHECKSUM(NEWID())) * @FRange + @FStartValue,

SomeRandomDate = RAND(CHECKSUM(NEWID())) * @Range +

DATEADD(day, -@StartValue, DATEADD(day, DATEDIFF(day, 0, GETDATE()), 0))

INTO #SomeTestTable

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

SELECT MinFloat=MIN(SomeRandomFloat), MaxFloat=MAX(SomeRandomFloat)

,MinDate=MIN(SomeRandomDate), MaxDate=MAX(SomeRandomDate)

FROM #SomeTestTable

DROP TABLE #SomeTestTable

Which delivers these results:

MinFloat MaxFloat MinDate MaxDate

400.000069523232 499.999990152787 2011-05-04 00:00:16.187 2011-08-12 23:59:58.753

I have taken the liberty to change the formula for SomeRandomFloat to deliver results on the closed interval {400,500}, which is what I was expecting. Date results end up being in the range of GETDATE() - 400 days and spans 100 days.

Mind you, I'm not reporting a bug. I'd prefer to think of it as an unexplained feature. 🙂

Let me know if I've misinterpreted something.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St