Generate Random Dates in T-SQL

By:   |   Updated: 2022-07-12   |   Comments (3)   |   Related: > Dates


Problem

Have you ever needed to create a large dataset in Microsoft SQL Server? Developers and analysts often build datasets for demoing reports and testing Microsoft SQL Server functionality. Valid dates are critical to include. You do not just want any old date. For example, if you are creating a sales date, you probably want a specific range from 2019 to 2021. It would not make sense to include the 1970s in your results.

Additionally, imagine creating data to demo a dynamic bar chart in Power BI or Excel. You may not want all your bars the same height. You might want to show fewer sales in 2019. If your test data is not realistic, it is hard for an audience to take it seriously. You do not want someone making fun of the data; it takes away from the stunning visuals.

Solution

You can easily specify a specific date range and return a random date using a combination of T-SQL functions. In this tutorial, I will outline what each does and how you can use them moving forward to create some fantastic insights.

Random Date Generator Script

Let's start by looking at the script.

DECLARE @StartDate AS date;
DECLARE @EndDate AS date;

SELECT @StartDate = '01/01/2019', -- Date Format - DD/MM/YYY
@EndDate = '12/31/2021';

SELECT DATEADD(DAY, RAND(CHECKSUM(NEWID()))*(1+DATEDIFF(DAY, @StartDate, @EndDate)),@StartDate) AS 'SalesDate';

At first glance, the code above might seem intimidating. If you are like me, I enjoy things simplified. Like untangling a knot, we will review each function individually, starting with the innermost.

Breaking Down the Random Date Generator Algorithm

Our first one, NEWID() is straightforward; SQL Server returns a unique GUID. The data type for the id will be a uniqueidentifier. The value will always be unique when executed on the same machine. Generating this ID gives us a great place to start.

NEWID()

Next, CHECKSUM() will convert the GUID to an integer. Instead of a unique value, we get a random number. The output will be the hash value of whatever GUID SQL passes in.

CHECKSUM(NEWID())

Then with the RAND() function, SQL returns a random decimal value from the integer in the previous step. The value will be greater than zero and less than one. The decimal value comes in handy later.

RAND(CHECKSUM(NEWID()))

Continuing, the DATEDIFF() returns the number of days between the start and end date. In our example above, the value is 1,096. We could have performed this outside the statement, making it less confusing. Please feel free to experiment with this.

1+DATEDIFF(DAY, @StartDate, @EndDate);

Up next, we are combining two of the pieces so far. Here we are multiplying the random decimal value by the total number of days. With this syntax, the number will never be greater than the total.

RAND(CHECKSUM(NEWID()))*(1+DATEDIFF(DAY, @StartDate, @EndDate));

Finally, let's put everything together. All we are doing here is adding the number of days to our start date. Remember, because of the previous step, we cannot go over the end date. You could even summarize this entire statement by saying we are adding a random number of days to our start date, not exceeding the total number of days.

DATEADD(DAY, RAND(CHECKSUM(NEWID()))*(1+DATEDIFF(DAY, @StartDate, @EndDate)),@StartDate)

Modified Version of Random Date Generator Script

I have used and recommended this specific script for years. Honestly, I could not even remember where it came from. However, just because you have been doing something for a long time does not mean it is the best solution. To simplify, I might break out the value for the number of days between our start and end dates. The updated script would look like this.

DECLARE @StartDate AS date;
DECLARE @EndDate AS date;
DECLARE @DaysBetween AS int;
SELECT @StartDate = '01/01/2019',
@EndDate = '12/31/2021',
@DaysBetween = (1+DATEDIFF(DAY, @StartDate, @EndDate))
SELECT DATEADD(DAY, RAND(CHECKSUM(NEWID()))*@DaysBetween,@StartDate) AS 'SalesDate';

Build Your Dataset

Now that we can generate random dates between a range, how can we build the dataset?

Let's start by creating a numbers table. I am using a script written by Aaron Bertrand. In my example below, we are building a table called Numbers with an integer from one to a million. Feel free to make it larger if you want. You are missing out if you have not used a numbers table before. With Aaron's script, they are easy to create and come in handy.

DECLARE @UpperBound INT = 1000000;
;WITH cteN(Number) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_columns AS s1
CROSS JOIN sys.all_columns AS s2
)
SELECT [Number] INTO #Numbers
FROM cteN WHERE [Number] <= @UpperBound;

Up next, we are creating a temporary table called SalesOrder. Our table will consist of two columns—one for our date and the other for the amount.

CREATE TABLE #SalesOrder (SalesDate date, SalesAmount decimal(10,2));

Now execute the statement below to populate the SalesOrder table.

DECLARE @StartDate AS date;
DECLARE @EndDate AS date;
DECLARE @DaysBetween AS int;

SELECT @StartDate = '01/01/2019',
@EndDate = '12/31/2021',
@DaysBetween = (1+DATEDIFF(DAY, @StartDate, @EndDate));

INSERT INTO #SalesOrder (SalesDate, SalesAmount)

SELECT DATEADD(DAY, RAND(CHECKSUM(NEWID()))*@DaysBetween,@StartDate) AS 'SalesDate',
ABS(CHECKSUM(NEWID()) % 50) + 10 AS 'SalesAmount'
FROM #Numbers;

After executing the statement above, the years will have an even spread. Your mileage will vary, but here is what my results look like.

Query Results this is a sample query results showing an even spread for the years.

Similar counts may be fine for your use case. If you were building this dataset for a chart, you might want to show a decline in sales for 2020. The simplest way to accomplish this is by inserting fewer rows for a specific year and more for others. You can run the query multiple times and change the start and end dates.

For example, if you need one million rows in total with fewer for 2019, only insert 200,000 for 2019. Then insert the remaining 800,000 for 2020 and 2021. You can distribute these however you would like. Executing the query multiple times may not be the prettiest solution. I am sure you can perform the population in a cursor or while loop but for the purposes here, let's not complicate it. Please let me know how you would tackle populating the table in the comments below.

Larger Datasets

You might need a dataset larger than one million rows. Requirements may call for ten or even one hundred million rows. Let's stick with something reasonable and pick ten million.

There are a couple of simple ways to go about this. The first would be making your numbers table larger. The second would be inserting the rows in batches of one million. The code below accomplishes the task with a WHILE loop.

DECLARE @StartDate AS date;
DECLARE @EndDate AS date;
DECLARE @DaysBetween AS int;
DECLARE @Count int;

SELECT @StartDate = '01/01/2019',
@EndDate = '12/31/2021',
@DaysBetween = (1+DATEDIFF(DAY, @StartDate, @EndDate)),
@Count = 0;

WHILE (@Count < 10000000)
BEGIN
INSERT INTO #SalesOrder (SalesDate, SalesAmount)
SELECT DATEADD(DAY, RAND(CHECKSUM(NEWID()))*@DaysBetween,@StartDate) AS 'SalesDate',
ABS(CHECKSUM(NEWID()) % 50) + 10 AS 'SalesAmount'
FROM #Numbers;
SET @Count = @Count + @@ROWCOUNT;
END

Conclusion

In this tutorial, we explored why you need to generate random calendar dates. The reason typically comes down to building datasets for demos and testing. Continuing, I demonstrated how to generate random new dates between a specific range. We examined each of the functions involved. Finally, we populated a table with random dates. I also shared a solution to keep all the years from being even.

Next Steps
  • The next time you need to build a dataset, consider how you want the data distributed, i.e., more rows for a particular year. As mentioned previously, the distribution may be more critical for reports.
  • If you do not plan to build a dataset, consider creating a numbers table. Please check out Aaron Bertrand's tip for creating one.
  • If you are interested in learning more about random numbers in SQL check out this tip.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jared Westover Jared Westover is a passionate technology specialist at Crowe, helping to build data solutions. For the past two decades, he has focused on SQL Server, Azure, Power BI, and Oracle. Besides writing for MSSQLTips.com, he has published 12 Pluralsight courses about SQL Server and Reporting Services. Jared enjoys reading and spending time with his wife and three sons when he's not trying to make queries go faster.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2022-07-12

Comments For This Article




Tuesday, August 23, 2022 - 1:33:35 PM - Jared Westover Back To Top (90402)
Boris and Thomas, thanks for stopping by to read the tip. It's a great question, and I had to scratch my head to remember! In the last group of code where I'm inserting 100,000 rows at a time, if I used RAND(), SQL would use the same float value for each (100,000) row in that set. If I only needed to return one row or insert one row at a time, RAND would likely perform better.

If I were to rewrite the tip, I would mention RAND and why I'm not using it in my specific example.

Tuesday, August 23, 2022 - 10:47:50 AM - Boris Back To Top (90401)
Hi Jared,
thanks for the great tip, I learned a lot, although while studying the single functions, I wondered why you use the "Checksum(NewId())" and not rand() alone (Without seed), as the function provides already a pseudo random float from 0 to 1.
Do I overlook something?

Friday, August 5, 2022 - 6:47:06 AM - Thomas Franz Back To Top (90348)
You should have mentioned, that without the CHECKSUM(NEWID()) you would get the same date for every line, if you e.g. do a SELECT TOP 100 <formula> FROM #numbers

If you need only a single random date or call it in a loop (what you should not do because of non existing speed for bigger data sets), you could simply use RAND() without any parameter.














get free sql tips
agree to terms