This is the sixth article in a short series of SQL Server query challenges to test and expand your SQL Server querying skills. While none of these short articles is of overwhelming complexity, they will hopefully either teach you something new or help you to reactivate potentially dormant SQL skills in preparation for your next real-world challenge. You can see the rest of the articles on the Query Answers with SQL Server page.
In this article we will once again imagine that you are working for a small classic car retailer called Prestige Cars Ltd, using the data that is in the company database.
The CEO has just had another idea; you can tell by the smile on her face as she walks over to your desk. Hiding your trepidation, you listen as she tells you that she needs a weekly calendar of sales for 2016. She makes it clear that she wants to see a list of all the weeks in the year, whether there were any sales in that week or not. Your challenge is to come up with the SQL to satisfy her request. Here is one possible solution:
; WITH Tally_CTE AS ( SELECT TOP 52 ROW_NUMBER() OVER (ORDER BY StockCode) AS Num FROM Data.Stock ) SELECT Num, SalesForTheWeek FROM Tally_CTE CTE LEFT OUTER JOIN ( SELECT SUM(TotalSalePrice) AS SalesForTheWeek ,DatePart(wk, SaleDate) AS WeekNo FROM Data.Sales WHERE YEAR(SaleDate) = 2016 GROUP BY DatePart(wk, SaleDate) ) SLS ON CTE.Num = SLS.WeekNo
Running this SQL produces output like that shown in Figure 1.
Figure 1. Displaying missing data using a sequence list
How It Works
Delivering data when the data exists is rarely an issue; however, you may need to apply slightly different querying techniques if you need to show “missing” data. By this, we mean that SQL does not naturally create complete lists of dates, or date elements such as weeks, where there is no corresponding data in a database.
We are presuming, here, that the Sales table (which, as its name indicates, contains itemized sales details including the sale date) might not contain a sale on every day of the year.
To understand this problem more clearly, run the SQL from the preceding snippet that returns the total sales per week for 2016, which is contained in the derived table with the alias SLS. The output from the derived table looks like that shown in Figure 2.
Figure 2. The source data with missing values
Although the figures are accurate, the data contains holes since there are no sales for certain weeks. Since these holes could be precisely the data that you want to focus on, it is important to return a full list of weeks—whether they contain data or not.
Forcing SQL Server to return a complete sequence (whether it is dates, weeks, or any other series of information) requires you to extend the SQL with a list that contains an unbroken and consecutive list of all the elements that you need to view. You can then use this list as the basis for a query that adds the aggregated output that you require.
A sequence like this is often called a tally table or a numbers list. What you have done in this SQL is
Created a CTE (Common Table Expression) that contains a numbers list.
Joined the CTE to a query (the derived subquery in this example) that calculates the sales per week.
Let’s begin by looking at the numbers table. Although we call this a table, it is really a dataset generated by a CTE. If you run the code inside the CTE named Tally_CTE, you see the output in Figure 3.
Figure 3. A sequence list or tally table
As you can see, this CTE simply returns a sequence of numbers. You obtain this sequence by
Choosing a table that contains more records than you need in the numbers table (the CTE output).
Selecting the first n records from this table that correspond to the number of items in the sequence that you require. In this example, there are 52 weeks in the year, so you use TOP 52 to limit the number of elements in the tally table.
Since you are not interested in the actual data that the source table (Stock) contains, but only in the number of records, you define a ROW_NUMBER() function (ordered by any field in the table) to provide a sequence of numbers.
The CTE at the start of the SQL serves only to provide a sequential list that provides a row for each week in the year.
Once you have the tally table set up, join this to the derived table that returns the actual data for each week of sales. Because you have extracted the week number from the SaleDate field using the DATEPART() function, you can join the week number for each week of sales to the corresponding record in the tally table. Defining the join as a LEFT OUTER join from the tally table (the CTE) to the derived query ensures that every record from the CTE is displayed, whether or not any corresponding data is in the derived query.
The result is the output from Figure 1, shown previously, that shows all the weeks in 2016 with the sales figures, even if there are no sales for a specific week.
If you use a single table as the basis for the tally table, it must contain at least as many records as you need for the sequential list. The next section explains a technique you can use to guarantee enough rows if you are worried that a single source table will not contain enough records.
That is it – you have seen a simple example of how to displaying records for missing data. Keep watching SQL Server Central.com ready for the next article in this series.
This article is adapted from the book “Query Answers with SQL Server Volume II: In-Depth Querying” by Adam and Karine Aspin. Available as both a paperback and an eBook from all good resellers including Amazon, Kobo and iBooks, as well as all good bookstores.
The sample data is available either in the accompanying material for this article, or on the Tetras Publishing website at: www.tetraspublishing.com/sqlserver.
There are a number of articles in this series. You can see them all on the Query Answers page.