Learning with Sample Data

I just had my one-year anniversary working for Redgate, and I must tell you, it’s been one of the best years of my professional career of 25 years. People aside (and there are a lot of really great people!), one of the reasons I’ve enjoyed this year so much is that Redgate understands and believes in helping the data community grow and learn. That’s a mission I can easily join.

As part of that mission, I have regular opportunities to learn how users in the PostgreSQL community use the database and the challenges they face. In many ways, it’s not all that different from the SQL Server community. To help the community learn how to overcome challenges with PostgreSQL, a good sample database is essential.

It turns out that good sample databases are hard to create and maintain.

There are many (a plethora??) of datasets available to import for one-off learning objectives. The real challenge is finding a database that can be used for long-term learning that grows over time and utilizes as many features as possible. Database architecture and design is hard. Doing it with fake, but realistic data is really challenging.

But I still wanted to try. 😀

In the PostgreSQL space, one of the open-source options is a small database called Pagila. It’s based on an old MySQL sample database called Sakila, a fake DVD rental store. The community tries to keep it up to date with new features in PostgreSQL. But I wanted something more realistic if possible.

For instance, I utilized an open-source movie database, TMDB, to get real movie titles, movie details, production company information, cast, and crew data. With the help of Ryan Lambert, I was able to create realistic (but fake) geospatial data for customer and store addresses. In fact, Ryan will be teaching a full day pre-con at PASS Summit on PostGIS and mapping with PostgreSQL. Most importantly, there are functions to generate continuous rental and payment data.

Over the next few weeks, I’ll start to share the database, schema, tools, and scripts I’ve used to create a the database, which I’m planning to call “Bluebox” (U.S. readers will understand the node to Redbox). This first attempt is definitely beta-quality at this point, but I’m excited about making this available to the community and seeing how others can help improve it.

So my question to you is, what are some of your go-to sample databases and datasets to learn more about your database of choice? Are there any that attempt to mimic a real, full application? What qualities do you look for in sample data to ensure you’re able to learn the server and features well?

I look forward to seeing your suggestions and comments!

Unstuck

by RyanBooz

SQLServerCentral

Have you ever been stuck in some area of your life? Maybe at work? That big project. Progressing in your career. The slow query that seems impossible to make faster. Learning new skills to switch careers. Maybe outside of work? The weekend home project that’s been going on for more than a few weekends… possibly […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2024-09-28

69 reads

Discuss

It's Time to Play

by RyanBooz

SQLServerCentral

Writing editorials every few weeks for a technical mailing list has been one of the more challenging responsibilities in my (nearly) two years with Redgate as an advocate working with Steve, Grant, and Louis. Each week, it’s interesting for me to see the mix of technical insights and everyday life stuff that each of us […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2024-08-31

380 reads

Discuss

Honeybee Swarms

by RyanBooz

SQLServerCentral

I love honeybees. This will be my seventh year as an amateur beekeeper, and aside from family or data, there are few other topics that I could easily spend an afternoon talking to you about. They’re amazing creatures. This past winter I had to move my beehives temporarily to the apiary of a friend. With […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

5 (1)

You rated this post out of 5. Change rating

2024-05-12 (first published: 2024-05-11)

67 reads

Discuss

Same Language, Different Words

by RyanBooz

SQLServerCentral

I had to come to the Redgate office in Cambridge this past week for a department onsite. As a result, my wife and I were able to come early for a few days adventuring together, the first trip we've taken by ourselves in nearly seven years. As a large family with six kids, it takes […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2023-06-17

124 reads

Discuss

The Human Factor

by RyanBooz

SQLServerCentral

A week ago I was in Pasadena attending the SCaLE 20x conference, a gathering of many different open-source communities discussing the technologies and platforms that draw them together. I was fortunate enough to hear some excellent presentations on PostgreSQL and give two talks as well. After the first round of talks on Friday morning a […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2023-03-18

84 reads

Discuss

Learning with Sample Data

Rate

Share

Categories

Share

Rate

Learning with Sample Data

Rate

Share

Categories

Share

Rate

Related content

Unstuck

It's Time to Play

Honeybee Swarms

Same Language, Different Words

The Human Factor