Problems displaying this newsletter? View online.
Database Weekly
The Complete Weekly Roundup of SQL Server News by SQLServerCentral.com
Hand-picked content to sharpen your professional edge
Editorial
 

Learning with Sample Data

I just had my one-year anniversary working for Redgate, and I must tell you, it’s been one of the best years of my professional career of 25 years. People aside (and there are a lot of really great people!), one of the reasons I’ve enjoyed this year so much is that Redgate understands and believes in helping the data community grow and learn. That’s a mission I can easily join.

As part of that mission, I have regular opportunities to learn how users in the PostgreSQL community use the database and the challenges they face. In many ways, it’s not all that different from the SQL Server community. To help the community learn how to overcome challenges with PostgreSQL, a good sample database is essential.

It turns out that good sample databases are hard to create and maintain.

There are many (a plethora??) of datasets available to import for one-off learning objectives. The real challenge is finding a database that can be used for long-term learning that grows over time and utilizes as many features as possible. Database architecture and design is hard. Doing it with fake, but realistic data is really challenging.

But I still wanted to try.

In the PostgreSQL space, one of the open-source options is a small database called Pagila. It’s based on an old MySQL sample database called Sakila, a fake DVD rental store. The community tries to keep it up to date with new features in PostgreSQL. But I wanted something more realistic if possible.

For instance, I utilized an open-source movie database, TMDB, to get real movie titles, movie details, production company information, cast, and crew data. With the help of Ryan Lambert, I was able to create realistic (but fake) geospatial data for customer and store addresses. In fact, Ryan will be teaching a full day pre-con at PASS Summit on PostGIS and mapping with PostgreSQL. Most importantly, there are functions to generate continuous rental and payment data.

Over the next few weeks, I’ll start to share the database, schema, tools, and scripts I’ve used to create a the database, which I’m planning to call “Bluebox” (U.S. readers will understand the node to Redbox). This first attempt is definitely beta-quality at this point, but I’m excited about making this available to the community and seeing how others can help improve it.

So my question to you is, what are some of your go-to sample databases and datasets to learn more about your database of choice? Are there any that attempt to mimic a real, full application? What qualities do you look for in sample data to ensure you’re able to learn the server and features well?

I look forward to seeing your suggestions and comments!

Ryan Booz

Join the debate, and respond to the editorial on the forums

 
The Weekly News
All the headlines and interesting SQL Server information that we've collected over the past week, and sometimes even a few repeats if we think they fit.
AI/Machine Learning/Cognitive Services

Creating Applications with ChatGPT, LLMs and Generative AI - Overview

From MSSQL Tips

Learn about ChatGPT, LLMs, and Generative AI and how these technologies can be used for application development.

Azure Databricks, Spark and Snowflake

Create Parameter Driven Databricks Engineering Notebooks

From MSSQL Tips

Learn how to use the Databricks utility library (dbutils) to read and write widgets which allows the data engineer to pass parameters to the notebook so that the same...

Azure SQL

How to copy Azure SQL database to a different subscription and different tenant

From Azure Database Support Blog

In this article, I will share with you the steps needed to copy your Azure SQL database from one of servers to another on a different subscription and different...

Community Interests

My data architecture book now has 15 chapters available!

From SQLServerCentral Blogs

Only one more chapter to go! As I have mentioned in prior blog posts, I have been writing a data architecture book, which I started last November. The title... The...

Conferences, Classes, Events, and Webinars

Pike’s Market to Pikes Peak: The PASS Summit is Worth the Journey

From Simple Talk

Recently I spoke at SQL Saturday Denver. The day after the conference we went to visit Pikes Peak Mountain. You have a couple of choices of how you can...

use a navigation scheme

From Storytelling with Data

In the following post, I share one of my secrets for both planning and delivering a powerful presentation—the navigation scheme. Before I go into detail on exactly what this...

DMO/SMO/Powershell

How To Install and Use PowerShell Copilot (Video Tutorial)

From IT Pro - Microsoft Windows Information, Solutions, Tools

Brien Posey explains how to set up PowerShell Copilot, how it works, and its capabilities and limitations.

Data Science

Quantiles of the generalized birthday problem

From AllAnalytics

A previous article discusses the birthday problem and its generalizations. The classic birthday problem asks, "In a room that contains N people, what is the probability that two or...

Data Visualisation

Bubble Charts in ggplot2

From Curated SQL

Steven Sanderson creates a bubble chart: Bubble charts are a great way to visualize data with three dimensions. The size of the bubbles represents a…

DevOps and Continuous Delivery (CI/CD)

CI/CD for Microsoft Fabric Data Warehouses using Azure DevOps

From Kevin Chant

Reading Time: 5 minutes In this post I want to cover CI/CD for Microsoft Fabric Data Warehouses using Azure DevOps. Which can now be done gracefully with the...

MDX/DAX

Optimizing callbacks in a SUMX iterator

From Sqlbi

This article explains a typical pattern to optimize a SUMX iterator by reducing the number of callbacks in the expression. Pushing calculations down to the VertiPaq storage engine is...

Machine Learning

How to use Machine Learning to make Predictions using Python and TensorFlow

From MSSQL Tips

This article looks at how to use machine learning ...

What is the swish activation function?

From Statistical Odds & Ends

The swish activation function is a commonly used activation function in deep learning networks. As a quick recap, at each neuron (node) of a neural network, we take a...

Oracle/PostgreSQL/MySQL/other RDBMS

Understanding Postgres Explain Plans

From Curated SQL

Muhammad Ali explains explain plans: In a previous post titled Exploring Postgres Performance: A Deep Dive into pg_stat_statements, we discussed the utility of pg_stat_statements as a tool…

Performance Tuning SQL Server

Trying out Batch Mode on Rowstore

From Curated SQL

Etienne Lopes has some fun with a feature: Before 2012, creating analytical queries (that usually scan many rows and have lots of aggregations) from big…

PowerPivot/PowerQuery/PowerBI

Scheduling Power BI Dataset Refreshes

From Curated SQL

Gilbert Quevauvilliers gets out the stopwatch: I have seen some questions in the Power BI / Fabric Community forum asking why datasets with a scheduled…

Getting Report Visual IDs With Power BI Desktop Developer Mode

From Chris Webb's BI Blog

Back in 2021 I wrote a post showing how you can link a DAX query generated by a Power BI report in Log Analytics to a visual in a...

R Language

Building a Bland-Altman Plot in R

From Curated SQL

Steven Sanderson performs a comparison: Before we dive into the code, let’s briefly understand what a Bland-Altman plot is. It’s a graphical method to visualize…

Functional Programming and R

From Curated SQL

Anirban Shaw ties functional programming to R: Functional Programming‘s relevance in the R programming language, a language primarily known for its prowess in data analysis and statistical computing, is…

Security News and Issues

How a Financial Services Firm Transformed Its Fraud Detection System

From IT Pro - Microsoft Windows Information, Solutions, Tools

Financial institutions must modernize their fraud detection systems as cybercriminals change their tactics. Learn about the overhaul of one such system at CNG Holdings.

1Password Becomes Latest Victim of Okta Customer Service Breach

From Dark Reading: Dark Reading News Analysis

Okta's IAM platform finds itself in cyberattackers' sights once again, as threat actors mount a supply chain attack targeting Okta customer support engagements.

Apache Zookeeper Vulnerability

From Curated SQL

The Instaclustr team reviews an announcement: On October 11, 2023, the Apache ZooKeeper™ project announced that a security vulnerability has been identified in Apache ZooKeeper, CVE-2023-44981. The Apache ZooKeeper project…

T-SQL and Query Languages

Using CAST and CONVERT Functions with Dates

From Callihan Data

CAST and CONVERT can both be used to switch a value to a new data type. They are similar, but certainly not identical. While CAST is considered ANSI SQL...

SQL SERVER – Exploring PIVOT and UNPIVOT

From Journey to SQL Authority with Pinal Dave

The SQL Server PIVOT and UNPIVOT operators are powerful tools that provide an easy way to transform your data in SQL. First appeared on SQL SERVER – Exploring PIVOT and...

Tech News

This new data poisoning tool lets artists fight back against generative AI

From Technology Review Feed - Tech Review Top Stories

A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set,...

How this Turing Award–winning researcher became a legendary academic advisor

From Technology Review Feed - Tech Review Top Stories

Every academic field has its superstars. But a rare few achieve superstardom not just by demonstrating individual excellence but also by consistently producing future superstars. A notable example of...

 
RSS FeedTwitter
This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com. Note: This is not the SQLServerCentral.com daily newsletter list, and unsubscribing to this newsletter will not stop you receiving the SQL Server Central daily newsletters. If you want to be removed from that list, you can follow the instructions on the daily newsletter.
©2019 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
webmaster@sqlservercentral.com

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -