SQLServerCentral Article

Generate Test Data Quickly With Cross Joins

,

Introduction

Need to rough up some bulk test data in a hurry? A carefully thought-out

Cross Join could be the answer.

Take any SQL query that joins two or more tables, delete the joining clause,

and what do you get? In SQL terms you get a Cross Join, in relational database

theory you get a Cartesian Product. Whatever you call it, you usually end up

with far more rows than you wanted, and most of them make no sense. Although

Cross Join queries are not normally much use, with a bit of thought we can use

them to quickly create large amounts of useful test data.

A simple example

Take the following query:

select * from

(

    select"Fred" as fName union 

    select "Wilma" union

    select "Barney" union

    select "Betty"

) as flintstones_1 CROSS JOIN

(

    select"Flintstone" as lName union

    select"Rubble" 

) as flintstones_2

This will produce 8 rows - the result of multiplying the four rows in the

first derived table (flintstones_1) against the two rows in the second derived

table (flintstones_2):

fName  lName 

------ ---------- 

Betty  Rubble

Betty  Flintstone

Barney Rubble

Barney Flintstone

Wilma  Rubble

Wilma  Flintstone

Fred   Rubble

Fred   Flintstone

(8 row(s) affected)

Needless to say, not all the above are real Flintstones, but that is not the

point. The point is that we have a cheap and cheerful way of generating multiple

unique names. For a small extra investment we can generate eighteen, not eight,

unique names:

select * from

(

    select"Fred" as fName union 

    select "Wilma" union

    select "Barney" union

    select "Betty" union

    select "Al" union

    select "Peggy" 

) as characters_1 CROSS JOIN

(

    select"Flintstone" as lName union

    select"Rubble" union

    select"Bundy"

) as characters_2

As many tables as you need can be Cross Joined to generate

exponentially-large amounts of test data. This simple query generates 27

mostly-fake politicians with middle names:

select * from

(

    select"Harry" as fName union 

    select "Winston" union 

    select "Vladimir" 

) as polit1 CROSS JOIN

(

    select"S " as mName union 

    select "Spencer" union 

    select "Ilich" 

) as polit2 CROSS JOIN

(

    select "Trueman" as lName union 

    select "Churchill" union 

    select "Lenin" 

) as polit3 

A more practical example

In the following query I have raided a few more US Sitcoms to make a simple

query that will generate no less than 150 unique authors in the PUBS database.

Note that I have serialised the two parts of the data that will make up the

author ID (and the phone number) to keep them unique, but I have chosen -55- to

be the center portion of all my generated IDs (010-55-0010 for example) There

were none in the initial authors table that matched this pattern so this gives

me an at-a-glance way of identifying my auto-generated authors.

insert authors 

select au_id1 + '-' + au_id2 as au_ud, 

    fName, 

    lName,

    au_id1 + ' 5' + au_id2 as phone,

    'Test address for ' + fName + ' ' + lName, 

    'London',

    'UK',

    '12345',

    1

from

(

    select'009' as au_id1, 'Fred' as fName union 

    select '010', 'Wilma' union

    select '012', 'Barney' union

    select '013', 'Betty' union

    select '014', 'Al' union

    select '015', 'Peggy' union

    select '016', 'Frasier' union

    select '017', 'Niles' union

    select '018', 'Homer' union

    select '019', 'Marge' union

    select '020', 'Hawkeye' union

    select '021', 'Trapper'union

    select'024', 'Sam'union

    select'025', 'Diane'union

    select'026', 'Rebecca'

) as test_authors_part_1 CROSS JOIN

(

    select'55-0010' as au_id2, 'Flintstone' as lName union

    select'55-0021', 'Rubble' union

    select'55-0022', 'Bundy' union

    select'55-0023', 'Crane' union

    select'55-0024', 'Simpson'union

    select'55-0025', 'Pierce'union

    select'55-0026', 'John' union

    select'55-0028', 'Malone'union

    select'55-0029', 'Chambers' union

    select'55-0030', 'Howe'

) as test_authors_part_2

Summary

The principle will work for any test data provided you construct your query

carefully - you can generate multiple orders for multiple books across multiple

stores for multiple dates. The data will exhibit a regular pattern, rather than

real-world randomness, but in most cases that will not be a problem.

 

About the author

Neil Boyle is an independent SQL Server consultant working out of London,

England. Neil's free SQL Server guide is available on-line at http://www.impetus-sql.co.uk

Rate

5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (2)

You rated this post out of 5. Change rating