With SQL Server you have the ability to create derived tables on the fly and then use these derived tables within your query. In very basic terms, a derived table is a virtual table that's calculated on the fly from a select statement.
In concept this is similar to creating a temporary table and then using the temporary table in your query, but the approach is much simpler, because it can all be done in one step. This can be tremendously useful in certain situations.
The biggest benefit of using derived tables over using temporary tables is that they require fewer steps, and everything happens in memory instead of a combination of memory and disk. The fewer the steps involved, along with less I/O, the faster the performance.
Here are the steps when you use a temporary table:
- Lock tempdb database
- CREATE the temporary table (write activity)
- SELECT data & INSERT data (read & write activity)
- SELECT data from temporary table and permanent table(s) (read activity)
- DROP TABLE (write activity)
- Release the locks
Compare this with the number of steps it takes for a derived table:
- CREATE locks, unless isolation level of "read uncommitted" is used
- SELECT data (read activity)
- Release the locks
As is rather obvious from this example, using derived tables instead of temporary tables reduces disk I/O and can boost performance.
Below, I'm going to use the Northwind database that included with MS-SQL 2000. Let's say you've been asked to generate a report that shows off the total number of orders each customer placed in 1996. "Not a problem." you think to yourself. "This is just an easy aggregated join query." So, you sit down and generate your query and come up with:
SELECT C.CustomerID, C.CompanyName, COUNT(O.OrderID) AS TotalOrders FROM Customers C LEFT OUTER JOIN Orders O ON C.CustomerID = O.CustomerID WHERE YEAR(O.OrderDate) = 1996 GROUP BY C.CustomerID, C.CompanyName
Looks good, But there's something missing. Customers that didn't place an order in 1996 aren't showing up. You could leave off customers that didn't place an order, but in many cases the customers that didn't buy place orders are the ones that are of the most interest.
If you think you can include those customers with a "is null" check, you'd be wrong. Run the following code and see if you notice what is wrong:
SELECT C.CustomerID, C.CompanyName, COUNT(O.OrderID) AS TotalOrders FROM Customers C LEFT OUTER JOIN Orders O ON C.CustomerID = O.CustomerID WHERE (YEAR(O.OrderDate) = 1996 OR O.OrderDate IS NULL) GROUP BY C.CustomerID, C.CompanyName
At first glance this may look right, but if you examine the results carefully you'll see that the customers with the zero count have never placed an order. If a customer has placed an order, but just not in the year 1996 they won't show up. This is because the "is null" check finds customers that have never placed an order -- it still doesn't do anything to add customers who've placed an order, but just not in 1996.
This is where a derived table can come in handy. Instead of using the "Orders" table, we'll reduce the Orders table to a snapshot of itself -- including only data from the year 1996.
SELECT C.CustomerID, C.CompanyName, COUNT(dOrders.OrderID) AS TotalOrders FROM Customers C LEFT OUTER JOIN /* start our derived table */ (SELECT * FROM Orders WHERE YEAR(Orders.OrderDate) = 1996) AS dOrders /* end our derived table */ ON C.CustomerID = dOrders.CustomerID GROUP BY C.CustomerID, C.CompanyName
Run the above query and examine the results. You should now see a row returned for each Customer and the total number or orders placed in the year 1996 -- including the customers that didn't place an order.
This works is because the LEFT JOIN will include all matches or null values. If the LEFT JOIN has matches (like in our first query,) but non that match the criteria those rows are excluded. In our derived table, since we're only running against the orders from the year 1996 we're ensuring that we return results for each customer.
"I think this is the best and easy example to understand the concept of derived tables."
Now if you asked when and why to use derived tables then I'll say that it all depends on the complexity of your query; you have to decide when to use derived tables, but I think derived tables are faster than Temporary tables. Read the following points for the when to use derived tables.
- If you face a challenge of whether to create a temporary table or a view to produce the desired query take a look at using a derived table instead
- Experiment with using derived tables, views and temporary tables to see what yields better performance results. There are several articles on the internet that have shown that using a derived table is faster then temporary tables, but try it for yourself.
Updating a derived table
Now the question of updating a derived table. I don't think that it is a big issue, but still it's a question asked about derived tables. So the answer is as follows,
- MS SQL 2000 - Not possible
- MS SQL 2005 - Possible
So, updating a derived table is not possible in SQL Server 2000. If you try to do so, then SQL Server will give you the following error.
Update T SET Id=0 FROM (SELECT * FROM tt1) AS T
Server: Msg 4421, Level 16, State 1, Line 1 Derived table 'T' is not updatable because a column of the derived table is derived or constant.
I am not 100% sure that it is not possible to update a derived table in SQL Server 2000, but from what I have studied and searched on Internet, I believe that it is not possible.
However, in the case of SQL Server 2005, it is possible. I have executed the same query above in SQL Server 2005, and it is working perfectly and all rows are updated.
The power of SQL Server amaze me all the time, to see the power of SQL Server all you need is a little creativity and knowledge of the syntax. Some times derived tables can be more useful, likewise, you need to create a view for a single query and then you want to use it within another query and after that the view should be dropped because no use of the view after that action, then here derived tables can help you and benefits you by saving to create you a catalog entry in the form of view in the server.
For repeated queries, a SQL derived table used multiple times performs well with cached definition. It will not down your performance. A SQL derived table differs from a temporary table in that a SQL derived table exists only for the duration of the query, while a temporary table exists until the server is brought down and also it uses lot of disk space than a derived table in the temdb database. So it's better to make use of Derived tables wherever possible by eliminating the one time use views and by eliminating the temporary tables. What I think is it will improve the performance of your query.