RE: Generating Test Data: Part 2 - Generating Sequential and Random Dates

SSCarpal Tunnel

Points: 4574

April 25, 2012 at 6:01 am

Superbly balanced article. But in true Oliver Twist fashion, "Can I have some more?"

In the real-world datetime data I've dealt with, they often follow roughly cyclical patterns, not pure random patterns. For example, if you look at say ambient temperature in the town in which you live, over time it might approximate a sort-of sine wave from day to day, but also the larger sine wave of seasonal change. Similarly, if you look at something like Emergency Room attendances, there is a daily, weekly, and seasonal repeating pattern of attendance numbers. If you want to generate test ER attendance datetimes, you don't want them to be strictly random, because in the long run, you'll generate about the same number of ER attendances (for example) for 3:00AM as you will for 3:00PM. You instead, want it to be random within certain parameters. If you knew the mean and standard deviation of the number of attendances of the 24 hours of the day, by the seven days of the week, by the 4 (in my case) seasons of the year (672 rows of reference data), I'm wondering whether you could combine that knowledge with Jeff's techniques to generate test data that closely approximates the patterns seen in reality.

...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell