Blog Post

Generate Lots of Test Data with CROSS JOIN

,

“Joe Test”

“Joe Test2″

“Joe Test3″

“Arrgh! There has got to be a better way to create some test data!”

Ever had that conversation with yourself? You need to create lots of data to load test an application or to create a demo. But going about it one row at a time is not only slow and tedious, but its results are less than stellar. …Joe Test101.

So, how can you create lots and lots of data without lots and lots of effort? Simple, let SQL do it for you.

Enter the Forgotten Join Type, the CROSS JOIN

Fortunately, SQL Server has a oft forgotten JOIN type that helps for occasions such as this. We’re all pretty familiar with INNER JOIN, LEFT JOIN, and RIGHT JOIN. But do you recall the most famous join type of them all? The CROSS JOIN?

A CROSS JOIN takes each row from the first table and combines it with every row from the second table. This produces a Cartesian product of the two tables.

For example, if my first table has 3 rows of data – Mary, Mark, Luke – and my second table has 2 rows of data – Jones, Smith – the Cartesian product will result in 6 rows of data. Mary Jones, Mary Smith, Mark Jones, Mark Smith, Luke Jones, and Luke Smith

The CROSS JOIN is much maligned. Most everyone who has ever written a SQL statement has accidentally produced a Cartesian product at some point. A query that you expected to return 12 hundred rows produced 12 billion rows. But we can harness that power for good and use it to create lots and lots of test data.

What’s in a Name?

inspector-2009-10-09Where can we get a good, long, list of names to use with our CROSS JOIN? Who captures that kind of information?

The United States Census Bureau, that’s who! Their site has a list of the most popular male, female, and family names. And they make the list available in files for download.

When I last checked, there were a total of 5,494 Given Names and 88,799 Family Names in the three files (the Given Names were broken into two files, male and female).

Creating a Cartesian product of that many names would produce a whopping 487,861,706 unique names (88,799 * 5,494). That’s far more than I typically require for my testing. So, I’ve pruned the list down quite a bit in my environment, down to 2,500.

Using the CROSS JOIN

After importing the files, I used the following script to create my test data.

–clean up the imported names

UPDATE

First_Names

SET

Given_Name = RTRIM(Given_Name)

UPDATE

Last_Names

SET

Family_Name = RTRIM(Family_Name)

–create a very simple customers table

CREATE TABLE Customers

(

Customer_ID INT NOT NULL IDENTITY(1,1)

,Last_Name VARCHAR(20) NOT NULL

,First_Name VARCHAR(20) NOT NULL

,Email_Address VARCHAR(50) NULL

)

–create the test data

INSERT INTO

Customers

(

Last_Name

,First_Name

,Email_Address

)

SELECT

f.Given_Name

,l.Family_Name

,f.Given_Name + ‘.’ + l.Family_Name + ‘@hotmail.com’

FROM

First_Names AS f CROSS JOIN

Last_Names AS l

My new Customers table now has 13,735,000 rows in it. A real business should be so fortunate!

Joe

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating