Blog Post

Generate test data using DBGen

,

Over the last couple of months I have on several occasions found myself in need of test data for demos, blogposts etc. You all know AdventureWorks, which will work for a lot of demoing stuff. But the amount of data is limited, and if you are demoing stuff about performance, AdventureWorks just isn’t that good.

I started looking around for tools, and thought I would have a look at TPC.  My goal was not to be able to compare performance and transaction loads, but simply to generate data sets a bit bigger than AdventureWorks. So I found the TPC-H DBGen. In this blog post I will show you step by step how to use that tool to generate a database with a set of tables with data volumes large enough to be able to demonstrate performance tuning stuff.

1. Download DBGen

Download DBGen from http://www.tpc.org/tpch/spec/tpch_2_14_3.zip and extract it. In this guide I extracted the zip file to C:\tpch_2_14_3\

image
This is the project that we need to build to get the executable.

2. Build the solution

Now open C:\tpch_2_14_3\dbgen\tpch.sln in Visual Studio. Depending on your Visual Studio version, you might be faced with a conversion wizard. Just click Finish to execute the conversion. All you need to do, is to build the entire solution. I had some errors because of some locked files, so I had to manually delete all files from the C:\tpch_2_14_3\dbgen\Debug folder before I could compile the solution. The result is the file C:\tpch_2_14_3\dbgen\Debug\dbgen.exe.

3. Generate data using dbgen.exe

Now we need to execution dbgen.exe. If we execute the command with –h we get some help:

image

If we simply run dbgen.exe, it default to generating 1 GB of data, divided into 8 different tables (customers, nation, lineitem, orders, parts, partsupp, region, supplier). The –s parameter specifies a scale factor, so –s 10 gives us 10GB, and –s 100 generates 100GB of data. Let’s just try the default:

image

Whoops, we got an error! Why this is, I’m not sure, but I found the solution to be simple: Copy the file dbgen.exe one level up, so it is located in the C:\tpch_2_14_3\dbgen folder, and try again:

image

The –v gives verbose output. Now it generates files for each table. Depending on the speed of your system, this may take a few minutes.

The resulting files will be located in the same directory as dbgen.exe. The list of generated files is:

image

 

4. Create database and tables

The next we need to do, is create an empty database, and create the tables.

CREATE DATABASE DemoData
GO

The schema we need, is available in the file C:\tpch_2_14_3\dbgen\dss.ddl and it looks like this:

-- Sccsid:     @(#)dss.ddl    2.1.8.1
CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152));
CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152));
CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL );
CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL);
CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL );
CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL);
CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,
                           O_CLERK          CHAR(15) NOT NULL,
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL);
CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL);

If you want referential integrity on your tables, you can find help in this file: C:\tpch_2_14_3\dbgen\dss.ri

5. Load data into tables

Now we simply need to execute the following commands to import the data:

USE DemoData
GO
BULK INSERT part FROM 'C:\tpch_2_14_3\dbgen\part.tbl' WITH (TABLOCK, DATAFILETYPE='char', CODEPAGE='raw', FIELDTERMINATOR = '|')
BULK INSERT customer FROM 'C:\tpch_2_14_3\dbgen\customer.tbl' WITH (TABLOCK, DATAFILETYPE='char', CODEPAGE='raw', FIELDTERMINATOR = '|')
BULK INSERT orders FROM 'C:\tpch_2_14_3\dbgen\orders.tbl' WITH (TABLOCK, DATAFILETYPE='char', CODEPAGE='raw', FIELDTERMINATOR = '|')
BULK INSERT partsupp FROM 'C:\tpch_2_14_3\dbgen\partsupp.tbl' WITH (TABLOCK, DATAFILETYPE='char', CODEPAGE='raw', FIELDTERMINATOR = '|')
BULK INSERT supplier FROM 'c:\tpch_2_14_3\dbgen\supplier.tbl' WITH (TABLOCK, DATAFILETYPE='char', CODEPAGE='raw', FIELDTERMINATOR = '|')
BULK INSERT lineitem FROM 'C:\tpch_2_14_3\dbgen\lineitem.tbl' WITH (TABLOCK, DATAFILETYPE='char', CODEPAGE='raw', FIELDTERMINATOR = '|')
BULK INSERT nation FROM 'C:\tpch_2_14_3\dbgen\nation.tbl' WITH (TABLOCK, DATAFILETYPE='char', CODEPAGE='raw', FIELDTERMINATOR = '|')
BULK INSERT region FROM 'C:\tpch_2_14_3\dbgen\region.tbl' WITH (TABLOCK, DATAFILETYPE='char', CODEPAGE='raw', FIELDTERMINATOR = '|')

And that’s it Smiley

Using this guide we can easily create 1, 10 or hundreds of GB of data, giving us the base we need to truly demonstrate the effect of performance tuning. This demo database will be used in later blog posts about query analyzing and performance optimizing. At some point I will also look into using the TPC tools for testing the transactional capacity of different systems, to be able to see the effect of configuration changes, hardware upgrades and other things we might wanna change to improve a system.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating