SQLServerCentral Article

Generating Completely Random Sample Output from a Dataset

,

This is the second article in a short series of SQL Server query challenges to test and expand your SQL Server querying skills. While none of these short articles is of overwhelming complexity, they will hopefully either teach you something new or help you to reactivate potentially dormant SQL skills in preparation for your next real-world challenge. You can see the rest of the articles on the Query Answers with SQL Server page.

In this article we will once again imagine that you are working for a small classic car retailer, called Prestige Cars Ltd, using the data that is in the company database.

The CEO wants to call up a handful of clients at random and ask them about the service that they have received from Prestige Cars. She asks you to produce a totally random list of sales with the relevant customers.

The Challenge

Your challenge is to come up with the SQL to satisfy her request. Here is one possible solution:

SELECT       TOP 5 PERCENT *
FROM         Data.SalesByCountry
ORDER BY     NEWID()

Running this code delivers the results that you can see in Figure 1.

Figure 1. Generating random output from a table using NEWID()

How It Works

T-SQL has a function, NEWID(), that adds a 32-character unique alphanumeric identifier to a record. This string is guaranteed to be unique by SQL Server. Not only that, but its structure is completely random; that is, there is no sequence or order to the way that it is generated.

This query uses NEWID() to randomize data selection like this:

First

You write a simple SELECT query to return data from the SalesByCountry view. This view joins all the required tables in the underlying database that you need to output meaningful data.

Then

You add a NEWID() field to the output from a query to produce a completely random identifier. The NEWID() field does not have to be in the SELECT clause to be added to the dataset. Adding it to the ORDER BY clause still adds the field, but without it being visible; so by sorting the data using this field, you have a recordset in a random sort order.

Finally

You add a TOP n PERCENT clause to the SELECT clause to output only the first few percent (5 percent in this example) of the recordset.

The final result is a random subset of data from a database.

Tricks and Traps

We have one comment to make about this technique. If you want to see what one of these unique identifier fields looks like, you can run code like the following snippet:

SELECT       NEWID()
FROM         Data.SalesByCountry

This gives the kind of result that you can see in Figure 2.

Figure 2. Generating unique identifiers using NEWID()

That is it. You have seen a simple example of how to generate completely random sample output from a dataset. Keep watching SQLServerCentral.com ready for the next article in this series.

This article is adapted from the book “Query Answers with SQL Server Volume II: In-Depth Querying” by Adam and Karine Aspin. Available as both a paperback and an eBook from all good resellers including Amazon, Kobo and iBooks, as well as all good bookstores.

The sample data is available either in the accompanying material for this article, or on the Tetras Publishing website at: www.tetraspublishing.com/sqlserver.

The Series

There are a number of articles in this series. You can see them all on the Query Answers page.

Resources

Rate

4.63 (8)

You rated this post out of 5. Change rating

Share

Share

Rate

4.63 (8)

You rated this post out of 5. Change rating