Generating Completely Random Sample Output from a Dataset

Adam Aspin, 2018-07-12

This is the second article in a short series of SQL Server query challenges to test and expand your SQL Server querying skills. While none of these short articles is of overwhelming complexity, they will hopefully either teach you something new or help you to reactivate potentially dormant SQL skills in preparation for your next real-world challenge. You can see the rest of the articles on the Query Answers with SQL Server page.

In this article we will once again imagine that you are working for a small classic car retailer, called Prestige Cars Ltd, using the data that is in the company database.

The CEO wants to call up a handful of clients at random and ask them about the service that they have received from Prestige Cars. She asks you to produce a totally random list of sales with the relevant customers.

The Challenge

Your challenge is to come up with the SQL to satisfy her request. Here is one possible solution:

SELECT       TOP 5 PERCENT *
FROM         Data.SalesByCountry
ORDER BY     NEWID()

Running this code delivers the results that you can see in Figure 1.

Figure 1. Generating random output from a table using NEWID()

How It Works

T-SQL has a function, NEWID(), that adds a 32-character unique alphanumeric identifier to a record. This string is guaranteed to be unique by SQL Server. Not only that, but its structure is completely random; that is, there is no sequence or order to the way that it is generated.

This query uses NEWID() to randomize data selection like this:

First

You write a simple SELECT query to return data from the SalesByCountry view. This view joins all the required tables in the underlying database that you need to output meaningful data.

Then

You add a NEWID() field to the output from a query to produce a completely random identifier. The NEWID() field does not have to be in the SELECT clause to be added to the dataset. Adding it to the ORDER BY clause still adds the field, but without it being visible; so by sorting the data using this field, you have a recordset in a random sort order.

Finally

You add a TOP n PERCENT clause to the SELECT clause to output only the first few percent (5 percent in this example) of the recordset.

The final result is a random subset of data from a database.

Tricks and Traps

We have one comment to make about this technique. If you want to see what one of these unique identifier fields looks like, you can run code like the following snippet:

SELECT       NEWID()
FROM         Data.SalesByCountry

This gives the kind of result that you can see in Figure 2.

Figure 2. Generating unique identifiers using NEWID()

That is it. You have seen a simple example of how to generate completely random sample output from a dataset. Keep watching SQLServerCentral.com ready for the next article in this series.

This article is adapted from the book “Query Answers with SQL Server Volume II: In-Depth Querying” by Adam and Karine Aspin. Available as both a paperback and an eBook from all good resellers including Amazon, Kobo and iBooks, as well as all good bookstores.

The sample data is available either in the accompanying material for this article, or on the Tetras Publishing website at: www.tetraspublishing.com/sqlserver.

The Series

There are a number of articles in this series. You can see them all on the Query Answers page.

Resources

SQLQueriesSampleData.zip

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.63 (8)

Log in or register to rate

You rated this post out of 5. Change rating

Share

Categories

T-SQL

Join the discussion and add your comment

Share

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.63 (8)

Log in or register to rate

You rated this post out of 5. Change rating

Related content

Displaying Hierarchical Data

by Adam Aspin

SQLServerCentral.com

T-SQL

Producing hierarchies from SQL tables can necessitate joining a table to itself. This article will explain how you can do this.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.08 (12)

Log in or register to rate

You rated this post out of 5. Change rating

2020-03-06 (first published: 2018-08-30)

24,676 reads

Discuss

Count the Number of Weekend Days between Two Dates

by Adam Aspin

SQLServerCentral.com

T-SQL

Handling weekends can be tricky in SQL. This article shows you how to Count the Number of Weekend Days between Two Dates

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

2.67 (9)

Log in or register to rate

You rated this post out of 5. Change rating

2020-10-16 (first published: 2018-08-23)

12,460 reads

Discuss

Display Median Values

by Adam Aspin

SQLServerCentral.com

T-SQL

DO you know how to display median values from a Dataset? In this short article you will use windowing functions to do this

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.5 (12)

Log in or register to rate

You rated this post out of 5. Change rating

2018-07-26

1,746 reads

Discuss

Creating Compound Interest Calculations in SQL

by Adam Aspin

SQLServerCentral.com

T-SQL

This is the first article in a short series of SQL Server query challenges to test and expand your SQL Server querying skills

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.57 (7)

Log in or register to rate

You rated this post out of 5. Change rating

2018-07-05

19,992 reads

Discuss

A Normalization Primer

by Brian Kelley

SQLServerCentral.com

T-SQL

For most DBAs, normalization is an understood concept, a bread and butter bit of knowledge. However, it is not at all unusual to review a database design by a development group for an OLTP (OnLine Transaction Processing) environment and find that the schema chosen is anything but properly normalized. This article by Brian Kelley will give you the core knowledge to data model.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

5 (3)

Log in or register to rate

You rated this post out of 5. Change rating

2003-01-13

18,675 reads

Discuss