I haven't posted much lately, but it has been a busy few months with many family activities. At work, I am happy to say that we are in the middle of upgrading our financial system, which includes an upgrade from SQL Server 2000 to SQL Server 2005 64 bit. Basically, I am reviewing everything I have done concerning that system for the past 4 years. This is a chance to improve stored procedures and create new SSIS packages to replace DTS. I am really happy that I am getting to use the new T-SQL features, such as Common Table Expressions (CTE). I have been a big fan of virtual tables, but I believe that CTEs have some advantages.
The first advantage is that CTEs are probably easier to read once you understand the syntax. The CTE is defined at the beginning and then used in one, and only one, statement immediately following it. But it can be used multiple times in that statement. Another interesting thing is that multiple queries can be defined within it and that they can build on each other. CTEs can not be nested, that is, you can't define one inside the other. So, CTEs are very flexible, but not always the best performing solution.
My favorite example is from Itzik Ben-Gan's recent book "Inside SQL Server 2005 T-SQL Programming." It creates a virtual number table. Levels 1 - 5 are catesian products that join the previous level on itself. The num statement uses the new Row_Number() function to apply a row number for each of the rows produced.
WITHL0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rowsL1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rowsL2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rowsL3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rowsL4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rowsL5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rowsnum AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)SELECT N FROM NUM WHERE N <= 100000;
CTEs are gone right after the query that uses them, but they can be persisted in a view. Here is a view definition so that I always have a table of numbers whenever I need them:
create view vwNumbersas WITH L0 AS (SELECT 1 AS C UNION ALL SELECT 1), L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B), L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B), L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B), L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B), L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B), num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5) SELECT N FROM NUM WHERE N <= 1000000
Here is an example that builds on Itzik's and it is very efficient. It shows how you can "nest" CTEs by using a view and define multiple queries in one CTE.
--Find the dates that orders were not placed in AdventureWorks:
Use AdventureWorks
GO
With Dates as (select dateadd(d,N,'1/1/1900') as NDate from vwNumbers), -- a "nested" cte DateBounds as (select min(OrderDate) as MinDate, max(OrderDate) as MaxDate from Sales.SalesOrderHeader) --second query in CTEselect NDate as MissingDatefrom Dates left join Sales.SalesOrderHeaderon NDate = OrderDate join DateBoundson NDate between MinDate and MaxDatewhere OrderDate is nullorder by NDate
In this example, I have used the dateadd function to translate my numbers view into a date query. I am defining two queries inside the CTE. The second gives me the first and last order date in the orders table. I don't care about dates outside that range. Finally, I use both of the queries defined in the CTE to help me solve the problem. I am using the left join syntax and checking for a null to find rows that are missing.