SQLServerCentral Article

Generating Test Data: Part 2 - Generating Sequential and Random Dates

,

Introduction

In Part 1 of this series, we learned how to easily create constrained random integers and floating point numbers using two simple formulas. We also learned how to use the power of "Pseudo Cursors" formed by cross-joins to very quickly generate millions of rows of random data instead of using the RBAR1 of a cursor or While loop.

In this installment of "Generating Test Data", we're going to learn how to generate constrained random dates with and without times. In the process, we'll also learn how to generate dates and times at fixed intervals.

Unlike many of my other articles, this article does require some prior knowledge. If you don't already know what the functions DATEDIFF, DATEADD, DATEPART, and DATENAME are, please look them up in Books Online before reading this article. You also need to know what a "Pseudo Cursor" is and how it works as explained in Part 1 of this series.

Date/Time Basics

Before we can begin calculating random dates and times, we have to know a little about the math behind the DATETIME data-Type. Books Online identifies it as consisting of two integers.

The first of these integers represents the "Whole" date as the number of days since the "Base Date". The "Base Date" defined as the date that is represented by the integer "0". In SQL Server, the "Base Date" is 01 Jan 1900.

The second integer is used to calculate the fractional part of a day, which is also known as "time".

It's a rare thing to actually see a date/time represented as two integers because humans have a hard time understanding it. To make life easier on us humans, dates and times are normally represented as just that, dates and times. For calculation purposes, the DATETIME data-type can also be most accurately represented by a FLOAT (No, I'm not saying it's a FLOAT but further discussion on this subject is well beyond the scope of this article) where the integer part of the FLOAT is the number of whole days since the base date and the second integer is represented as the fractional part of the FLOAT.

If you convert a DATETIME data-type to a FLOAT, the number you see is frequently referred to as the "Date Serial Number". Let's see what the "Date Serial Number" looks like starting at the "Base Date". The following code shows the base date of SQL Server and several other date/times in 6 hour (1/4 day) increments.

WITH
cteSampleDates (SampleDate, Comment) AS
(
 SELECT CAST('1900-01-01 00:00' AS DATETIME),'12:00AM Midnight or a "Whole" day' UNION ALL
 SELECT CAST('1900-01-01 06:00' AS DATETIME),'6:00AM, 1/4 through the day'       UNION ALL
 SELECT CAST('1900-01-01 12:00' AS DATETIME),'12:00PM, 1/2 through the day'      UNION ALL
 SELECT CAST('1900-01-01 18:00' AS DATETIME),'6:00PM, 3/4 through the day'       UNION ALL
 SELECT CAST('1900-01-02 00:00' AS DATETIME),'12:00AM Midnight or a "Whole" day' UNION ALL
 SELECT CAST('1900-01-02 06:00' AS DATETIME),'6:00AM, 1/4 through the day'       UNION ALL
 SELECT CAST('1900-01-02 12:00' AS DATETIME),'12:00PM, 1/2 through the day'      UNION ALL
 SELECT CAST('1900-01-02 18:00' AS DATETIME),'6:00PM, 3/4 through the day'       UNION ALL
 SELECT CAST('1900-01-03 00:00' AS DATETIME),'12:00AM Midnight a "Whole" day'
)
 SELECT SampleDate,
        DateSerial = CAST(SampleDate AS FLOAT),
        Comment
   FROM cteSampleDates
;

Here are the results from the run above. 

SampleDate              DateSerial Comment
----------------------- ---------- ---------------------------------
1900-01-01 00:00:00.000 0          12:00AM Midnight or a "Whole" day
1900-01-01 06:00:00.000 0.25       6:00AM, 1/4 through the day
1900-01-01 12:00:00.000 0.5        12:00PM, 1/2 through the day
1900-01-01 18:00:00.000 0.75       6:00PM, 3/4 through the day
1900-01-02 00:00:00.000 1          12:00AM Midnight or a "Whole" day
1900-01-02 06:00:00.000 1.25       6:00AM, 1/4 through the day
1900-01-02 12:00:00.000 1.5        12:00PM, 1/2 through the day
1900-01-02 18:00:00.000 1.75       6:00PM, 3/4 through the day
1900-01-03 00:00:00.000 2          12:00AM Midnight a "Whole" day

We know that 6 hours is ¼ of a day and we can see that the Date Serial Number is advancing by .25 or a quarter of a day.

We normally (and thankfully) don't have to work directly with the "Date Serial Number" in SQL Server thanks to some built in functions such as DATEDIFF, DATEADD, DATEPART, and DATENAME. Still, in order to work with dates and times effectively, it helps a lot to understand that dates and times are simply represented by some simple numbers behind the scenes.

To summarize this section, dates can be represented by the whole number of days since the Base Date and times can be represented by fractions.

Generating Sequential Dates and Dates with Times

Random generation of dates and times doesn't always fit the bill for what our test data may need to look like. Sometimes we need fixed periods to simulate things like timed automated meter or sensor readings or fixed and certain payments.

Beyond generating test data, you may not have a Calendar Table to create the fixed periods (usually day, weeks, or months) you need to join against to aggregate data for reporting purposes (by month, for example). Generating sequential date/time periods may be just what the doctor ordered to turn your non-SARGable2 query into a fire breathing model of efficiency.

Let's start off with generating whole dates.

Generating Sequential Whole Dates

As we've just seen, dates are made of two parts and to generate sequential whole dates,we simply need to increment the "date part" of a date to generate sequential dates. In that vein, we learned in Part 1 of this series how to quickly generate incrementing sequential numbers using a "Pseudo Cursor" formed by a Cross-Join using the following code.

 SELECT TOP (1000000)
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

There are nearly 3 millennia of days in a million numbers. Obviously, we need to constrain that a bit. Let's tackle the simple problem of generating all of the days in the year 1900 and see how it can be done.

First, how many days were there in 1900? Before you sprain your brain trying to figure out if it were a Leap Year or not, let's put SQL Server to work (Note that SQL Server does it correctly where Excel does not. Excel STILL thinks that 1900 was a Leap Year). Let's define an "inclusive" (will be first date to be included in the results) StartDate and an "exclusive" (will excluded from the results) EndDate. Representing the first date of a year as a DATETIME in SQL Server is easy. You can just use the year and SQL Server will figure out that you mean the "first of the year".

DECLARE @StartDate DATETIME, --Inclusive
        @EndDate   DATETIME, --Exclusive
        @Days      INT
;
 SELECT @StartDate = '1900', --Inclusive
        @EndDate   = '1901', --Exclusive
        @Days      = DATEDIFF(dd,@StartDate,@EndDate)
;
 SELECT Days = @Days
;

That, of course, returns the following.

Days
-----------
365

As a bit of a sidebar, note that we did NOT have to add "1" to make up for the loss of 1 day due to the subtraction because the EndDate is actually 1 day more than what we want to end on. You'll find out that this bit of knowledge will greatly simplify your life for other date calculations.

Now, all we need to do generate one row for each day in the year 1900 is to add 365 sequential numbers to the StartDate. Like this…

DECLARE @StartDate DATETIME, --Inclusive
        @EndDate   DATETIME, --Exclusive
        @Days      INT
;
 SELECT @StartDate = '1900', --Inclusive
        @EndDate   = '1901', --Exclusive
        @Days      = DATEDIFF(dd,@StartDate,@EndDate)
;
 SELECT TOP (@Days)
        TheDate = DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,@StartDate)
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

If we look at the first 5 and last 5 rows of the result set from that, here's what we get, just like we wanted, all of the days of the year 1900.

TheDate
-----------------------
1900-01-01 00:00:00.000
1900-01-02 00:00:00.000
1900-01-03 00:00:00.000
1900-01-04 00:00:00.000
1900-01-05 00:00:00.000
...{snip}...
1900-12-27 00:00:00.000
1900-12-28 00:00:00.000
1900-12-29 00:00:00.000
1900-12-30 00:00:00.000
1900-12-31 00:00:00.000
(365 row(s) affected)

An important thing to notice is that we had to subtract 1 from the ROW_NUMBER to make things come out right. This is because we need to add "0" to the StartDate to return the StartDate. If we didn't subtract the "1", we would have return the dates of 02 Jan 1900 to 01 Jan 1901 instead of 01 Jan 1900 to 31 Dec 1900.

Of course, you can use any StartDate and EndDate (provided the StartDate <= EndDate in most cases) from 01 Jan 1753 through 31 Dec 9999 with the DATETIME data-type. 01 Jan 17533 was the starting date of the first whole year after Great Britain adopted the current "Western" or "Gregorian" calendar. It's true that you can use more extended date ranges in SQL Server 2008 and above but, again, that's beyond the scope of this article.

This technique isn't limited to just generating days. You can generate weeks, months, years, etc. For example, to generate the first of the month for every month in the 21st century, you simply need to change your calculations from "day" based to "month" based. The rest of the code is essentially identical.

DECLARE @StartDate DATETIME, --Inclusive
        @EndDate   DATETIME, --Exclusive
        @Months      INT
;
 SELECT @StartDate = '2000', --Inclusive
        @EndDate   = '2100', --Exclusive
        @Months    = DATEDIFF(mm,@StartDate,@EndDate)
;
 SELECT TOP (@Months)
        TheMonth = DATEADD(mm,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,@StartDate)
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

Here are the first and last 5 rows from that run.

TheMonth
-----------------------
2000-01-01 00:00:00.000
2000-02-01 00:00:00.000
2000-03-01 00:00:00.000
2000-04-01 00:00:00.000
2000-05-01 00:00:00.000
...{snip}...
2099-08-01 00:00:00.000
2099-09-01 00:00:00.000
2099-10-01 00:00:00.000
2099-11-01 00:00:00.000
2099-12-01 00:00:00.000
(1200 row(s) affected)

If you wanted something a bit more esoteric, such as finding all of the Fridays in the year 2012, you simply need to change the point of reference again. This time, we'll change it to a period of 7 days and use the first Friday in 2012 (automatic calculation of this date is beyond the scope of this article) as a StartDate.

DECLARE @StartDate DATETIME, --Inclusive
        @EndDate   DATETIME, --Exclusive
        @Weeks     INT
;
 SELECT @StartDate = '2012-01-06', --Inclusive
        @EndDate   = '2013-01-06', --Exclusive
        @Weeks     = DATEDIFF(dd,@StartDate,@EndDate)/7 --Don't use "WK" here
;
 SELECT TOP (@Weeks)
        Fridays = DATEADD(wk,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,@StartDate)
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

Here are the abbreviated results.

Fridays
-----------------------
2012-01-06 00:00:00.000
2012-01-13 00:00:00.000
2012-01-20 00:00:00.000
2012-01-27 00:00:00.000
2012-02-03 00:00:00.000
...{snip}...
2012-11-30 00:00:00.000
2012-12-07 00:00:00.000
2012-12-14 00:00:00.000
2012-12-21 00:00:00.000
2012-12-28 00:00:00.000
(52 row(s) affected)

Note that "weeks" is a strange creature because there aren't precisely 52 weeks in any given year. In order to include the proper EndDate for other years, you may have to add a couple of more days to the @EndDate variable depending on what your needs are.

Generating Sequential Dates with Times

Another name for this subsection might be "How to generate sequential time periods". We've already seen how to use Division to create "periods" consisting of more than one day. What if we want to generate sequential periods for, say, every 4 hours for the month of January in 2012? The use of just a little more complicated math will serve us well but do notice that the basic code hasn't changed much.

As yet another sidebar, notice that telling SQL Server the month and year will result in a date for the first of the given month in a fashion similar to you just telling it the year and it figuring out the correct date for the first of the year.

DECLARE @StartDate DATETIME, --Inclusive
        @EndDate   DATETIME, --Exclusive
        @Periods   INT
;
 SELECT @StartDate = 'Jan 2012', --Inclusive
        @EndDate   = 'Feb 2012', --Exclusive
        @Periods   = DATEDIFF(dd,@StartDate,@EndDate)*(24/4) --24 hours in a day, every 4 hours
;
 SELECT TOP (@Periods)
        Every4Hours = DATEADD(hh,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))*4-4,@StartDate)
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

In the code above, we had to calculate the fraction of days in hours to represent a "period" and count them. To do that, we did like we did before by first calculating the number of days from the StartDate to the EndDate but then we had to multiply by the inverse of the fraction of a day that 4 hours represents. We did that by multiplying the number of days by (24/4) which is 6, the number of 4 hour periods in a day.

Here's the abbreviated output.

Every4Hours
-----------------------
2012-01-01 00:00:00.000
2012-01-01 04:00:00.000
2012-01-01 08:00:00.000
2012-01-01 12:00:00.000
2012-01-01 16:00:00.000
...{snip...
2012-01-31 04:00:00.000
2012-01-31 08:00:00.000
2012-01-31 12:00:00.000
2012-01-31 16:00:00.000
2012-01-31 20:00:00.000
(186 row(s) affected)

The other change in the code that you'll notice is that we had to multiply the ROW_NUMBER (which counts the periods) by 4 (the number of hours in each period) and then subtract 1 period (-4 for 4 hours). This looks quite a bit different than the other code we've used so far until you make the realization that all of these numbers were "1" for all the other examples. Since "1" times any number is still that same number, the multiplication by 1 was simply dropped from the other formulas.

Creating dates and times every 6 seconds for the month of Jan 2012 (for example) only requires a shift in the type of period and a little difference in the math. You just need to change the math to correctly define the number of periods, change the period type in the DATEADD, and change the multiplication and subtraction to represent the number of periods. Other than those minor changes, the code continues to be essentially the same as in all the other examples for sequential dates with times we've seen so far.

DECLARE @StartDate DATETIME, --Inclusive
        @EndDate   DATETIME, --Exclusive
        @Periods   INT
;
 SELECT @StartDate = 'Jan 2012', --Inclusive
        @EndDate   = 'Feb 2012', --Exclusive
        @Periods   = DATEDIFF(dd,@StartDate,@EndDate)*(24*3600/6) --24 hours in a day, 3600 seconds per hour, every 6 seconds
;
 SELECT TOP (@Periods)
        Every6Seconds = DATEADD(ss,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))*6-6,@StartDate)
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

Here are the results.

Every6Seconds
-----------------------
2012-01-01 00:00:00.000
2012-01-01 00:00:06.000
2012-01-01 00:00:12.000
2012-01-01 00:00:18.000
2012-01-01 00:00:24.000
...{snip}...
2012-01-31 23:59:30.000
2012-01-31 23:59:36.000
2012-01-31 23:59:42.000
2012-01-31 23:59:48.000
2012-01-31 23:59:54.000
(446400 row(s) affected)

The Need for Speed

That last example generated 446,400rows and, in the grid mode, it took more than 18 seconds to generate all those rows on my 10 year old, single CPU, desktop computer using SQL Server 2005. You might be disheartened by that apparent lack of performance. The key here is that you probably wouldn't return those rows to the screen. Instead, you'd likely store the information in a table. Let's see how long that takes when we use SELECT/INTO to create the table.

SET STATISTICS TIME ON;
DECLARE @StartDate DATETIME, --Inclusive
        @EndDate   DATETIME, --Exclusive
        @Periods   INT
;
 SELECT @StartDate = 'Jan 2012', --Inclusive
        @EndDate   = 'Feb 2012', --Exclusive
        @Periods   = DATEDIFF(dd,@StartDate,@EndDate)*(24*3600/6) --24 hours in a day, 3600 seconds per hour, every 6 seconds
;
 SELECT TOP (@Periods)
        Every6Seconds = DATEADD(ss,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))*6-6,@StartDate)
   INTO #SomeTestTable
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
    SET STATISTICS TIME OFF;
   DROP TABLE #SomeTestTable --Just to cleanup after the test
SQL Server Execution Times:
   CPU time = 843 ms,  elapsed time = 924 ms.
(446400 row(s) affected)

Try THAT with a While Loop or a Cursor and let me know how it works out for you. 😉 Is there a faster way? You bet there is. But, at the cost of some complexity, it'll only cut off about 64ms from the CPU time and about 14ms from the elapsed time. It's just not worth the added complexity. Keep the generation of test data as simple as possible or people won't use it.

Generating Random Dates and Dates with Times

Rather than sequential dates and times, generating random dates and dates with times is probably what most people would be after for test data. For example, creating data for transactions in multiple bank accounts would require random dates (possibly with times) to simulate the unpredictable nature of when people wrote checks or hit up their ATM for some fast cash.

Generating Random "Whole" Dates

Generally speaking, a "whole" date is a date with no time element or, more correctly, a date with a midnight time. For example, a date/time of '2012-03-02 00:00:00.000' would be considered to be a whole date where '2012-03-02 22:11:13.107' would not because the time element has a value other than "0" or midnight.

To generate random whole dates, we need a random number generator that generates whole numbers. We learned just how to make such a random number generator back in Part 1 of this series. Here's the basic algorithm as a reminder.

--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
        @StartValue   INT,
        @EndValue     INT,
        @Range        INT
;
--===== Preset the variables to known values
 SELECT @NumberOfRows = 1000000,
        @StartValue   = 400,
        @EndValue     = 500,
        @Range        = @EndValue - @StartValue + 1
;
--===== Create "random constrained" integers within 
     -- the parameters identified in the variables above.
 SELECT TOP (@NumberOfRows)
        SomeRandomInteger =  ABS(CHECKSUM(NEWID())) % @Range + @StartValue
   INTO #SomeTestTable
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

To demonstrate how to use the random integer generator to generate random dates, let's say that we want to generate a million random dates falling in the 10 year period of 2010 through 2019. The first thing we need to do is to change the variables a bit. We need to change a couple of the variables from the INT data-type to DATETIME data-type. While we're at it, we'll also change the names of the variables to be more appropriate for the problem at hand.

--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
        @StartDate    DATETIME,
        @EndDate      DATETIME,
        @Days         INT --This is still the "range"
;

Next, we need to preset those variables as we did when we generated sequential dates to represent our problem of generating random dates for the years 2010 through 2019.

--===== Preset the variables to known values
 SELECT @NumberOfRows = 1000000,
        @StartDate    = '2010', --Inclusive
        @EndDate      = '2020', --Exclusive
        @Days         = DATEDIFF(dd,@StartDate,@EndDate)
;

Last but not least, we again need to do like we did during the generation of sequential dates. We need to add whole numbers to the StartDate except, in this case, we not going to use sequential numbers generated by the likes of ROW_NUMBER. Instead, we're going to use our random integer formula.

--===== Create "random constrained whole dates" within 
     -- the parameters identified in the variables above.
 SELECT TOP (@NumberOfRows)
        SomeRandomDate = ABS(CHECKSUM(NEWID())) % @Days + @StartDate
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

As you can clearly see, not much has changed from the code that was used to generate simple random integers. The random integer formula is the same. @Days is stage a "range". We're still adding the start value (a date in this case) to the integer. Because the start value of @StartDate is a DATETIME, the result of the whole formula is converted to a DATETIME. Because we used only integers and we started with a "whole" date, we've generated only whole dates.

The following code is all the previous snippets put together with some additions to save the dates in a temp table.

--===== Conditionally drop the test table to make reruns easier.
     IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
        DROP TABLE #SomeTestTable
;
--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
        @StartDate    DATETIME,
        @EndDate      DATETIME,
        @Days         INT --This is still the "range"
;
--===== Preset the variables to known values
 SELECT @NumberOfRows = 1000000,
        @StartDate    = '2010', --Inclusive
        @EndDate      = '2020', --Exclusive
        @Days         = DATEDIFF(dd,@StartDate,@EndDate)
;
--===== Create "random constrained" integers within 
     -- the parameters identified in the variables above.
 SELECT TOP (@NumberOfRows)
        SomeRandomDate =  ABS(CHECKSUM(NEWID())) % @Days + @StartDate
   INTO #SomeTestTable
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

If we examine the contents of the temp table, we can see that we have, in fact, generated a million random dates across the desired 10 year time span and we've done so in about 1 second.

--===== Show the extent of the random whole dates
 SELECT MinDate = MIN(SomeRandomDate), 
        MaxDate = MAX(SomeRandomDate), 
        DistinctDates = COUNT(DISTINCT SomeRandomDate),
        Rows = COUNT(*)
   FROM #SomeTestTable
;
--===== Show ten rows of the table
 SELECT TOP 10 *
   FROM #SomeTestTable
;

Here are the results from that.

MinDate                 MaxDate                 DistinctDates Rows
----------------------- ----------------------- ------------- -----------
2010-01-01 00:00:00.000 2019-12-31 00:00:00.000 3652          1000000
(1 row(s) affected)
SomeRandomDate
-----------------------
2019-04-26 00:00:00.000
2018-11-03 00:00:00.000
2018-09-02 00:00:00.000
2011-01-20 00:00:00.000
2013-04-10 00:00:00.000
2017-08-18 00:00:00.000
2015-11-30 00:00:00.000
2015-11-21 00:00:00.000
2016-06-27 00:00:00.000
2018-09-01 00:00:00.000
(10 row(s) affected)

Microsoft introduced the DATE data-type in SQL Server 2008. It is truly a "whole" date data-type because it literally has no time element to it. There's just one little problem with it. Somewhere along the line, they either got "religious" about dates or they didn't want to spend the extra bit of development time it would have taken to allow it to be used with conventional addition and subtraction. What that means is that the previous code example won't work if the @StartDate or @EndDate variables are of the DATE data-type.

Now, I don't believe that MS will further tamper with the current mathematical abilities of the DATETIME data-type by removing the ability to add and subtract integers to and from it, but to make your code more "bullet proof" for future changes, you should probably get into the habit of using DATEADD instead of addition and subtraction. To wit, the code for the previous example should look like the following and will work with both the DATETIME and DATE data-types.

--===== Create "random constrained" integers within
     -- the parameters identified in the variables above.
     -- "Bullet proofed" by using DATEADD instead of simple addition.
 SELECT TOP (@NumberOfRows)
        SomeRandomDate =  DATEADD(dd,ABS(CHECKSUM(NEWID())) % @Days, @StartDate)
   INTO #SomeTestTable
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;

Generating Random Dates with Times

At the beginning of this article, I said that the DATETIME data-type could be accurately manipulated using FLOAT values. If you'll also recall, we have a random FLOAT generator that we discovered in Part 1 of this series. Using that random number generator, we can generate random dates with times just as easily as we did "whole" dates. Here's the complete code including the conditional table drop.

--===== Conditionally drop the test table to make reruns easier.
     IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
        DROP TABLE #SomeTestTable
;
--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
        @StartDate    DATETIME,
        @EndDate      DATETIME,
        @Days         INT --This is still the "range"
;
--===== Preset the variables to known values
 SELECT @NumberOfRows = 1000000,
        @StartDate    = '2010', --Inclusive
        @EndDate      = '2020', --Exclusive
        @Days         = DATEDIFF(dd,@StartDate,@EndDate)
;
--===== Create the test table with "random constrained" integers and floats
     -- within the parameters identified in the variables above.
 SELECT TOP (@NumberOfRows)
        SomeRandomDateTime = RAND(CHECKSUM(NEWID())) * @Days + @StartDate
   INTO #SomeTestTable
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
Here's the code that checks the extent of the random data.  
--===== Show the extent of the random whole dates
 SELECT MinDateTime   = MIN(SomeRandomDateTime), 
        MaxDateTime   = MAX(SomeRandomDateTime), 
        DistinctDates = COUNT(DISTINCT SomeRandomDateTime),
        Rows = COUNT(*)
   FROM #SomeTestTable
;
--===== Show ten rows of the table
 SELECT TOP 10 *
   FROM #SomeTestTable
;

Here's the code that checks the extent of the random data.

--===== Show the extent of the random whole dates
 SELECT MinDateTime   = MIN(SomeRandomDateTime), 
        MaxDateTime   = MAX(SomeRandomDateTime), 
        DistinctDates = COUNT(DISTINCT SomeRandomDateTime),
        Rows = COUNT(*)
   FROM #SomeTestTable
;
--===== Show ten rows of the table
 SELECT TOP 10 *
   FROM #SomeTestTable
;

As expected and as is possible with random dates and times, there can be some duplication as shown in the results. Because we have random times, we also may not have 00:00:00.000 as a time on the first date nor 23:59:59.997 for the time on the last date. If you run the generation code again, you will, of course, come up with different times.

MinDateTime             MaxDateTime             DistinctDates Rows
----------------------- ----------------------- ------------- -----------
2010-01-01 00:04:51.217 2019-12-31 23:43:08.160 999778        1000000
(1 row(s) affected)
SomeRandomDateTime
-----------------------
2010-10-07 20:25:57.030
2010-04-02 00:16:46.053
2019-04-22 14:50:10.373
2011-08-17 08:00:22.647
2017-07-31 20:39:27.357
2016-04-25 19:12:17.290
2017-03-08 19:24:34.453
2017-02-27 21:42:53.160
2014-07-08 04:00:25.543
2010-01-07 02:46:09.770
(10 row(s) affected)

Casting the result of the above to the TIME data-type available starting in SQL Server 2008 will generate random times for you. You can generate random times in SQL Server 2005 but they will all have the date of 1900-01-01.

Summary

In this, Part 2 of this series, we learned how to convert simple ROW_NUMBERs to sequential dates and sequential time periods. Using the random INTEGER and random FLOAT algorithms from Part 1, we also learned how to generate random "whole" dates and random dates with times.

To reiterate a bit from Part 1, the generation of test data doesn't need to be a time consuming or difficult chore. Using tools such as a TOP limited Cross Join and SELECT/INTO, we can build a million rows of data in scant seconds with very little coding effort. Through the use of some standard functions and some simple math, we can easily control the domain of numbers and dates that our random integer and float generation code returns.

Coming up in Part 3, random names, characters, codes, and SSNs.

Thanks for listening, folks.

--Jeff Moden


© Copyright by Jeff Moden 2012, All Rights Reserved.

1 RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

2 Definition of "SARGable": http://en.wikipedia.org/wiki/Sargable

3 Brief History of the Gregorian Calendar: http://en.wikipedia.org/wiki/Gregorian_calendar

Rate

4.79 (24)

You rated this post out of 5. Change rating

Share

Share

Rate

4.79 (24)

You rated this post out of 5. Change rating