Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Generating Test Data: Part 2 - Generating Sequential and Random Dates


Generating Test Data: Part 2 - Generating Sequential and Random Dates

Author
Message
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45382 Visits: 39940
dwain.c (4/25/2012)
You da man Jeff! Another great article!

I've seen quite a few forum posts recently where your discussion of how DATETIMEs actually work would have assisted the posters had they understood this fundamental concept.


There's quite a bit more as to how they actually work behind the scenes but few people ever need to go there... including me. ;-)

Thanks for the feedback, Dwain.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45382 Visits: 39940
GPO (4/25/2012)
Superbly balanced article. But in true Oliver Twist fashion, "Can I have some more?"

In the real-world datetime data I've dealt with, they often follow roughly cyclical patterns, not pure random patterns. For example, if you look at say ambient temperature in the town in which you live, over time it might approximate a sort-of sine wave from day to day, but also the larger sine wave of seasonal change. Similarly, if you look at something like Emergency Room attendances, there is a daily, weekly, and seasonal repeating pattern of attendance numbers. If you want to generate test ER attendance datetimes, you don't want them to be strictly random, because in the long run, you'll generate about the same number of ER attendances (for example) for 3:00AM as you will for 3:00PM. You instead, want it to be random within certain parameters. If you knew the mean and standard deviation of the number of attendances of the 24 hours of the day, by the seven days of the week, by the 4 (in my case) seasons of the year (672 rows of reference data), I'm wondering whether you could combine that knowledge with Jeff's techniques to generate test data that closely approximates the patterns seen in reality.


"More please?" is the normal question after any such introductory article and I thank you for your thoughtful feedback.

I've done such things in more of a stepped fashion but not in a nice, smooth sinusoidal fashion. I suppose that it just a matter of making smaller steps. I'll have to think about that. Thank you, again, for the feedback.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Cadavre
Cadavre
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2596 Visits: 8437
Nicely explained Jeff.

I may have to "borrow" your method of generating random DATETIME data, my method is more difficult to understand when people glance at it. :-D


Forever trying to learn

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better, quicker answers on SQL Server performance related questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/



If you litter your database queries with nolock query hints, are you aware of the side effects?
Try reading a few of these links...

(*) Missing rows with nolock
(*) Allocation order scans with nolock
(*) Consistency issues with nolock
(*) Transient Corruption Errors in SQL Server error log caused by nolock
(*) Dirty reads, read errors, reading rows twice and missing rows with nolock


Craig Wilkinson - Software Engineer
LinkedIn
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45382 Visits: 39940
TheSQLGuru (4/25/2012)
Hey Jeff, can you put a downloadable file with the relevant operational code parts of the post? Thanks in advance, and wonderful stuff as always!


Heh... stealing an idea from Lotus isn't my idea of a laudable defense. ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45382 Visits: 39940
Cadavre (4/26/2012)
Nicely explained Jeff.

I may have to "borrow" your method of generating random DATETIME data, my method is more difficult to understand when people glance at it. :-D


Borrow away. These aren't my methods. They pretty well standard for folks that have been using NEWID() for such things over the years because they fall into the classic random number mathematical formulas.

If you don't mind, could you post your method? It's always interesting to see how others do things.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45382 Visits: 39940
GPO (4/25/2012)
The spam people clearly target the articles that everyone is going to read...


It's ironic that I have the SPAM people to thank for such a nice compliment. Thanks, GPO.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Cadavre
Cadavre
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2596 Visits: 8437
Jeff Moden (4/26/2012)
Cadavre (4/26/2012)
Nicely explained Jeff.

I may have to "borrow" your method of generating random DATETIME data, my method is more difficult to understand when people glance at it. :-D


Borrow away. These aren't my methods. They pretty well standard for folks that have been using NEWID() for such things over the years because they fall into the classic random number mathematical formulas.

If you don't mind, could you post your method? It's always interesting to see how others do things.


Certainly. Generally, when I need "random" datetime data I go with this: -
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment
END

--1,000,000 "Random" rows of data
SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,
RAND(CHECKSUM(NEWID())) * 366 /*(Number of days)*/ + CAST('2000' AS DATETIME) /*(Start date, e.g. '2000-01-01 00:00:00'*/ AS randomDate
INTO #testEnvironment
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;



If instead I want "random" date, I normally go with this: -
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment
END

--1,000,000 "Random" rows of data
SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,
DATEADD(DAY,((ABS(CHECKSUM(NEWID())) % 366 /*(Number of days)*/) + 1),CAST('2000' AS DATE) /*(Start date, e.g. '2000-01-01*/) AS randomDate
INTO #testEnvironment
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;



On an internal wiki-page at work, I long since added a page with a brief explanation of how to create pseudo-random data. The script looks like this: -
--Standard TestEnvironment of 1,000,000 rows of random-ish data
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment;
END;

--1,000,000 Random rows of data
SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,
RAND(CHECKSUM(NEWID())) * 30000 + CAST('1945' AS DATETIME) AS randomDateTime,
--SQL SERVER 2008 ONLY!! ONLY FOR USE ON 9.0 AND ABOVE
DATEADD(DAY,((ABS(CHECKSUM(NEWID())) % 366) + 1),CAST('2000' AS DATE)) AS randomDate,
ABS(CHECKSUM(NEWID())) AS randomBigInt,
(ABS(CHECKSUM(NEWID())) % 100) + 1 AS randomSmallInt,
RAND(CHECKSUM(NEWID())) * 100 AS randomSmallDec,
RAND(CHECKSUM(NEWID())) AS randomTinyDec,
RAND(CHECKSUM(NEWID())) * 100000 AS randomBigDec,
CONVERT(VARCHAR(6),CONVERT(MONEY,RAND(CHECKSUM(NEWID())) * 100),0) AS randomMoney
INTO #testEnvironment
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;




Forever trying to learn

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better, quicker answers on SQL Server performance related questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/



If you litter your database queries with nolock query hints, are you aware of the side effects?
Try reading a few of these links...

(*) Missing rows with nolock
(*) Allocation order scans with nolock
(*) Consistency issues with nolock
(*) Transient Corruption Errors in SQL Server error log caused by nolock
(*) Dirty reads, read errors, reading rows twice and missing rows with nolock


Craig Wilkinson - Software Engineer
LinkedIn
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45382 Visits: 39940
I see what you mean. Same principle... just harder for some folks to see. Thanks, Craig.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45382 Visits: 39940
TheSQLGuru (4/25/2012)
Hey Jeff, can you put a downloadable file with the relevant operational code parts of the post? Thanks in advance, and wonderful stuff as always!


If I understand correctly, those are pretty well summarized in the last two sections of the article. Is that what you want as a downloadable file?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Samrat Bhatnagar
Samrat Bhatnagar
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 353
These two part series were really useful. Thanks.

Any suggestions on how to generate test data for following scenarios:
1. Two tables linked using PK-FK relationship e.g. Product Category and Product Subcategory
2. Self Referential Tables like the Employee table with EmployeeId, ManagerId, <Other employee details>
3. Using the master tables in 1, 2 generate a table that has ProductFK, EmployeeFK, <Some data> as in a Data warehouse.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search