Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Generate Lots of Test Data with CROSS JOIN


“Joe Test”

“Joe Test2″

“Joe Test3″

“Arrgh! There has got to be a better way to create some test data!”

Ever had that conversation with yourself? You need to create lots of data to load test an application or to create a demo. But going about it one row at a time is not only slow and tedious, but its results are less than stellar. …Joe Test101.

So, how can you create lots and lots of data without lots and lots of effort? Simple, let SQL do it for you.

Enter the Forgotten Join Type, the CROSS JOIN

Fortunately, SQL Server has a oft forgotten JOIN type that helps for occasions such as this. We’re all pretty familiar with INNER JOIN, LEFT JOIN, and RIGHT JOIN. But do you recall the most famous join type of them all? The CROSS JOIN?

A CROSS JOIN takes each row from the first table and combines it with every row from the second table. This produces a Cartesian product of the two tables.

For example, if my first table has 3 rows of data – Mary, Mark, Luke – and my second table has 2 rows of data – Jones, Smith – the Cartesian product will result in 6 rows of data. Mary Jones, Mary Smith, Mark Jones, Mark Smith, Luke Jones, and Luke Smith

The CROSS JOIN is much maligned. Most everyone who has ever written a SQL statement has accidentally produced a Cartesian product at some point. A query that you expected to return 12 hundred rows produced 12 billion rows. But we can harness that power for good and use it to create lots and lots of test data.

What’s in a Name?

inspector-2009-10-09Where can we get a good, long, list of names to use with our CROSS JOIN? Who captures that kind of information?

The United States Census Bureau, that’s who! Their site has a list of the most popular male, female, and family names. And they make the list available in files for download.

When I last checked, there were a total of 5,494 Given Names and 88,799 Family Names in the three files (the Given Names were broken into two files, male and female).

Creating a Cartesian product of that many names would produce a whopping 487,861,706 unique names (88,799 * 5,494). That’s far more than I typically require for my testing. So, I’ve pruned the list down quite a bit in my environment, down to 2,500.

Using the CROSS JOIN

After importing the files, I used the following script to create my test data.

–clean up the imported names
UPDATE
First_Names
SET
Given_Name = RTRIM(Given_Name)

UPDATE
Last_Names
SET
Family_Name = RTRIM(Family_Name)

–create a very simple customers table
CREATE TABLE Customers
(
Customer_ID INT NOT NULL IDENTITY(1,1)
,Last_Name VARCHAR(20) NOT NULL
,First_Name VARCHAR(20) NOT NULL
,Email_Address VARCHAR(50) NULL
)

–create the test data
INSERT INTO
Customers
(
Last_Name
,First_Name
,Email_Address
)
SELECT
f.Given_Name
,l.Family_Name
,f.Given_Name + ‘.’ + l.Family_Name + ‘@hotmail.com’
FROM
First_Names AS f CROSS JOIN
Last_Names AS l

My new Customers table now has 13,735,000 rows in it. A real business should be so fortunate!

Joe

Comments

Posted by ta.bu.shi.da.yu on 19 October 2009

That's really useful! If I may make a suggestion, don't use hotmail.com though. Use test.com, as this domain was created for this purpose.

Posted by Joew@webbtechsolutions.com on 19 October 2009

Good point, ta.bu.shi.da.yu. Thanks for mentioning that.

Posted by Glenn on 19 October 2009

I looked up test.com and it doesn't mention anything about being created for that purpose.  Using either is potentially a loaded gun as you may unintentionally spam either domain.

Posted by Joew@webbtechsolutions.com on 19 October 2009

To be clear, the script was intended to show how to create a bunch of test data. I did not mean to imply that the test data would ever reach an smtp server. Obviously it should not.

But, these are good points and I'll keep them in mind as I create scripts for future blog posts.

Thanks!

Joe

Posted by Gaby Abed on 19 October 2009

How about @NONEXISTENT_DOMAIN_NAME.COM? :-)  It's all good.

"What mighty contests arise out of trivial things." -Alexander Pope/Trivial Pursuit

Posted by Bob Lanteigne on 19 October 2009

It's off-topic but domain names example.com, .net, and .org are reserved for use in documentation and are better choices for dummy names.

Posted by kelly on 19 October 2009

i think the url you're thinking of is "example.com".   Example.com and Example.net are reserved for documentation and testing in RFC 2606.

Leave a Comment

Please register or log in to leave a comment.