I’ve been working more with SQL Data Generator (SDG) because it solves some problems that many software developers have with test data. Often each developer needs to create their own set of test data, with these being the common actions:
- Use a backup of production, perhaps old.
- Randomly insert a few values each developer comes up with
- Use random data from SDG, a new project each time.
- Load a known data set from production or test systems
While I think small amounts of random data work well, I think the data should reflect (somewhat) the types of data in production. Totally random strings don’t work, but similar words, structures, etc. make sense.
However having a consistent set of data for each developer is a great idea. SDG can consistently generate data sets, but to make them meaningful, you might want to have control over the types of data inserted.
Using Real Words
I talked about the difference between random words and real words. However, what if you want to include specific types of items?
Let’s evolve some data. If I have a varchar(500) column, SDG defaults to this:
[A-Z0-9]*
Which gives me this:
That’s not great if I wanted to examine specific rows and determine if they were being returned by a query. This is just too random and hard to verify.
However I have options. For example, I could use the “Insert File List” item. This gives me a list of files that could be helpful. In this case, let’s choose “Color”.
I see this in the RegEx box.
($”Color.txt”)
Now my test data looks like this:
What’s in “color.txt”? Let’s see. The file is under the Data Generator 3 folder, in a Config location. I see lots of XML and text files.
If I open Color.txt, I see what I expect.
Now, let’s experiment. Let’s create a SQL Server file. I’ll put values in like this:
I need to change permissions on the config folder to allow saving, but I put it there. I also had to close and re-open SDG to pick up the new file.
Now I’ll add that to the RegEx box.
That’s cool. What if I made a file with a number of random words in it. Like a dictionary of sorts. I could do this. I create dictionary_small.txt.
Now I include that a number of times.
Those are some great descriptions. I’m sure I could have fun with this in other ways as well. Let’s create some good and bad data for a cleaning operation.
The ability to include data from flat files is a great option in SDG for putting together a data set that developers can actually use, understand, and count on for loading up new databases or tables, especially when creating quick code branches to test something out.
Filed under: Blog Tagged: Data Generator, Redgate, software development, syndicated, testing