SQLServerCentral Article

Segment Data into Deciles

,

This is the third article in a short series of SQL Server query challenges to test and expand your SQL Server querying skills. While none of these short articles is of overwhelming complexity, they will hopefully either teach you something new or help you to reactivate potentially dormant SQL skills in preparation for your next real-world challenge. You can see the rest of the articles on the Query Answers with SQL Server page.

In this article we will once again imagine that you are working for a small classic car retailer called Prestige Cars Ltd, using the data that is in the company database. Suppose that the sales manager of Prestige Cars Ltd. wants to group sales into deciles— that is “buckets” of equal size where each group of data contains one tenth of the records in the total dataset. She needs this to look at segments of the customer base.

The Challenge

Your challenge is to come up with the SQL to satisfy her request. Here is one possible solution:

SELECT       ST.Color, MakeName + ', ' + ModelName AS MakeAndModel
             ,SD.SalePrice
             ,NTILE(10) OVER (ORDER BY SD.SalePrice DESC) AS SalesDecile
FROM        Data.Make AS MK 
INNER JOIN  Data.Model AS MD ON MK.MakeID = MD.MakeID
INNER JOIN  Data.Stock AS ST ON ST.ModelID = MD.ModelID
INNER JOIN  Data.SalesDetails SD ON ST.StockCode = SD.StockID
ORDER BY    SalesDecile, Color, MakeAndModel

Running this query gives the results that you can see in Figure 1. You have to scroll down the result set to see how the SalesDecile field changes every 35 records or so.

Figure 1. Using the NTILE() function to group data into segments

How It Works

Of course, you need to understand the PrestigeCars database if you are going to comprehend this code – especially the FROM/JOIN clauses. To make your life easier, here is the ERD (Entity Relationship Diagram) for this database:

Figure 2: The Prestige Cars database Entity Relationship Diagram

You can see from the output that this query has added a field that indicates the decile (or segment, or bucket, if you prefer) that each record belongs to. You can then use this segmentation to pursue your analysis. This query is, at its heart, a simple SELECT query where you joined the necessary tables and selected the required fields. Then you added a final field using the NTILE() function.

This query delivers an easy solution to a potentially complex question—all thanks to the NTILE() function. What this windowing function does is segment the dataset into as many separate subgroups as you have specified when you apply the function. Whatever the number of records in the dataset, the NTILE() function subdivides them into the number of groups that you specify as a parameter of the function.

Just like the RANK() and DENSE_RANK() functions, the NTILE() function requires you to add the OVER keyword and an ORDER BY clause inside parentheses—in addition to a value that is used for classifying the data. The NTILE() function requires one further element—the number of “buckets” that you want to appear in the output. This is defined by adding a number inside the parentheses just after the NTILE() function. The number that you add is the number of groups into which the dataset will be divided.

Tricks and Traps

We have only one comment to make here. NTILE() can also contain a PARTITION BY clause to add a further level of segmentation should you need it. 

That is it – you have seen a simple example of how to segment data into deciles. Keep watching SQL Server Central.com ready for the next article in this series.

This article is adapted from the book “Query Answers with SQL Server Volume II: In-Depth Querying” by Adam and Karine Aspin. Available as both a paperback and an eBook from all good resellers including Amazon, Kobo and iBooks, as well as all good bookstores.

The sample data is available either in the accompanying material for this article, or on the Tetras Publishing website at: www.tetraspublishing.com/sqlserver.

The Series

There are a number of articles in this series. You can see them all on the Query Answers page.

Resources

Rate

5 (3)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (3)

You rated this post out of 5. Change rating