SQLServerCentral Article

An Urgent Ad Hoc Report

,

Usually you leave your office at 4:05 PM so that at 4:45 PM you can pick up your kids and

drive them to music class (your oldest daughter), to dance class (your youngest daughter),

and basketball practice (your son). Usually that's your routine but today at 4:00 PM you get a request

to generate an urgent ad hoc report.

According to WikiAnswers.com - "Ad Hoc is Latin and means 'to this'. It is used to indicate something that is done at the time without planning ahead of time. An Ad Hoc report is one that is created when someone asks for it, not at a specific point".

You call home and inform your kids that you will be late, but how late you do not know.

This is the request you: you need a report that shows all contractor names, their latest expense and date the expense was done.

First of all, you run a script to find Contractor and Expense tables across all databases on you production server. You open Expense table and review the data and table structure. What a mess!!! Your manager stops by and wants to know when he can have the report. You explain to him that you need at least two hours to generate this report, because of bad table design (no constraints, no indexes, no expense date column).

Your manager asks you to not use a cursor, not to use recursion, and send the report to a business analyst

and cc him. You know the query that will generate the requested report is just another query. You know the business analyst most likely on his way home. You also know it's fun to write this query and get correct results.

You decide to analyze data in table (table you see first time). You create test tables and populate them with a subset of data from real tables. Here is the code to set up the tables and populate them with some data.

--drop table dbo.tmpBusinessExpense
Create table dbo.tmpBusinessExpense (
 ContractorIdint not null
,FiscalYear int not null
,Quaterint not null
,ReportMonthint not null
,ReportWeekint not null
,ReportDayint not null
,Expensemoney not null
,CreateDate         datetime not null
)
create index UI_tmpBusinessExpense_ContractorId  on dbo.tmpBusinessExpense(ContractorId)
--truncate table dbo.tmpBusinessExpense
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (68022, 2009, 1, 2, 3, 4, 927428, '20060204')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2006, 1, 2, 3, 4, 278324, '20060204')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2007, 2, 3, 4, 6, 358974, '20060304')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2008, 3, 4, 2, 10, 438928, '20060404')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2009, 1, 5, 22, 18, 273954, '20090304')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2009, 2, 9, 33, 25, 373974, '20090304')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2009, 2, 10, 43, 7, 478464, '20090304')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2009, 3, 2, 6, 7, 768834, '20090304')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2009, 3, 2, 6, 12, 1268834, '20090304')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (238, 2009, 3, 2, 6, 29, 29568834, '20090304')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (608257, 2009, 2, 10, 43, 7, 478464, '20090708')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (608257, 2009, 4, 12, 51, 7, 768834, '20090405')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (608257, 2009, 4, 2, 51, 24, 1268834, '20090324')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, 
ReportMonth, ReportWeek, ReportDay, Expense, CreateDate)
values (608257, 2009, 4, 12, 51, 14, 34, '20090812')
--drop table dbo.tmpContractor
Create table dbo.tmpContractor (
 ContractorIdint not null
,ContractorName varchar(100) not null
)
-- drop index dbo.tmpContractor.UI_tmpContractor_ContractorId
create unique clustered index UI_tmpContractor_ContractorId  on dbo.tmpContractor(ContractorId)
insert into dbo.tmpContractor (ContractorId, ContractorName)
values (238, 'R & P suns')
insert into dbo.tmpContractor (ContractorId, ContractorName)
values (68022, 'ALS Revices')
insert into dbo.tmpContractor (ContractorId, ContractorName)
values (608257, 'Entrance Manufacturing')

You analyze the data for few a random contractors with this query:

declare @ContractorId int
set @ContractorId = 608257
--set @ContractorId = 238
--set @ContractorId = 68022
select * from dbo.tmpBusinessExpense
where ContractorId = @ContractorId
order by ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay

The task of selecting the latest expense for each contractor seems to be easy. You decide to work only with one contractor data to simplify the query writing process. Using SQL Server 2005 new features like Common Table Expression (CTE) and the function ROW_NUMBER(), you wrote a query that returns all entries added by one contractor: the ContractorId = 608257, and the Contractor Name is "Entrance Manufacturing".

declare @ContractorId int
set @ContractorId = 608257;
with RowsForOnlyEntranceManufacturing (
RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense) as
(
select ROW_NUMBER() OVER (order by ContractorId, FiscalYear, Quater, ReportMonth, 
ReportWeek, ReportDay) AS RowId, ContractorId, FiscalYear, Quater, ReportMonth, 
ReportWeek, ReportDay, Expense
from dbo.tmpBusinessExpense BE
where BE.ContractorId = @ContractorId)
select RowId, ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense
from RowsForOnlyEntranceManufacturing

For Entrance Manufacturing, the last entered row (RowId = 4) is the result row, the only row you want to see in the report. How do you filter all the rows in the table to get only this row?

The first thought was - what if the sum of FiscalYear, ReportMonth, ReportWeek, ReportDay can uniquely identify the result row for a contractor?

select BE.FiscalYear + BE.ReportMonth + BE.ReportDay as SumOfThreeColumns, BE.*
from dbo.tmpBusinessExpense BE
where BE.ContractorId = 608257

Looking at rows 3 and 4 you got the answer - NO, this sum does not uniquely identify each row in the table. This is because SumOfThreeColumns = 2035 for both rows.

What if combining these three columns you can re engineer the Original ExpenseDate - the date a contractor filed his expense? You want to have the ExpenseDate in DateTime format. The right format! If you have the ExpenseDate in DateTime format you can derive FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay without storing them in table and get the results you need quickly.

Having ExpenseDate for each record in dbo.tmpBusinessExpense table you can use correlated subquery to select a row with the latest date! Simple? But what if a contractor has multiple expenses for a day? You checked it and you did not find duplicates.

select ContractorId, FiscalYear, Quater, ReportMonth, ReportDay, count(*)
from dbo.tmpBusinessExpense
group by ContractorId, FiscalYear, Quater, ReportMonth, ReportDay
having count(*) > 1

You convert integer to string and concatenate FiscalYear, ReportMonth and ReportDay to get ExpenseDate

select convert(char(4), BE.FiscalYear)  + 
       convert(varchar(2), BE.ReportMonth) + 
       convert(varchar(2), BE.ReportDay) as ExpenseDate, 
BE.*
from dbo.tmpBusinessExpense BE
where BE.ContractorId = 608257

You want to convert varchar data to ExpenseDate column with DateTime datatype.

declare @ContractorId int
set @ContractorId = 608257;
select 
convert(DateTime, convert(char(4), BE.FiscalYear) + 
convert(varchar(2), BE.ReportMonth) + 
convert(varchar(2), BE.ReportDay))
 ,BE.*
from dbo.tmpBusinessExpense BE
where BE.ContractorId = @ContractorId

You got this message:

Msg 241, Level 16, State 1, Line 1
Conversion failed when converting datetime from character string.

Why it does not work? You try it for ReportDay = 14 and it works.

declare @ContractorId int
set @ContractorId = 608257;
select 
convert(DateTime, convert(char(4), BE.FiscalYear) + 
                  convert(varchar(2), BE.ReportMonth) + 
                  convert(varchar(2), BE.ReportDay)
       )  as ExpenseDate
 ,BE.*
from dbo.tmpBusinessExpense BE
where BE.ContractorId = @ContractorId and ReportDay = 14

It does not work for records 1,2,3 where leading zero for Month or Day is missing. You realized Month and Day should have two digits. If they do not - attach leading zero.

declare @ContractorId int
set @ContractorId = 608257;
select 
convert (DateTime,
(
convert(char(4), BE.FiscalYear) + 
case when len(convert(varchar(2), BE.ReportMonth)) = 1 then '0' + convert(varchar(2), BE.ReportMonth) 
else convert(varchar(2), BE.ReportMonth) end +
case when len(convert(varchar(2),  BE.ReportDay)) = 1 then '0' + convert(varchar(2),  BE.ReportDay) 
else convert(varchar(2),  BE.ReportDay) end
)
) as ExpenseDate
 , BE.*
from dbo.tmpBusinessExpense BE
where BE.ContractorId = @ContractorId

Now, you noticed that you do not need to use convert when identifying the length of ReportMonth,

and ReportDay and you removed it.

declare @ContractorId int
set @ContractorId = 608257;
select 
convert (DateTime,
(
convert(char(4), BE.FiscalYear) + 
case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) 
else convert(varchar(2), BE.ReportMonth) end +
case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2),  BE.ReportDay) 
else convert(varchar(2),  BE.ReportDay) end
)
) as ExpenseDate
 , BE.*
from dbo.tmpBusinessExpense BE
where BE.ContractorId = @ContractorId

or using Common table expression (CTE)

declare @ContractorId int
set @ContractorId = 608257;
with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, 
ReportDay, Expense, CreateDate, ExpenseDate)
as (
select 
BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, 
BE.ReportDay, BE.Expense, BE.CreateDate,
convert (DateTime,
(
convert(char(4), BE.FiscalYear) + 
case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) 
else convert(varchar(2), BE.ReportMonth) end +
case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2),  BE.ReportDay) 
else convert(varchar(2),  BE.ReportDay) end
)
) as ExpenseDate
from dbo.tmpBusinessExpense BE
where BE.ContractorId = @ContractorId
)
select * from FixedExpenseDate

You use a CTE for simplicity, and you want to separate preparation of ExpenseDate and the main logic where you will determine the result set. You use correlated query to get in the result set only latest row entered by contractor Entrance Manufacturing. For each contractor query should return only one record with max ExpenseDate

declare @ContractorId int
set @ContractorId = 608257;
with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, 
ReportDay, Expense, CreateDate, ExpenseDate)
as (
select 
BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, 
BE.ReportDay, BE.Expense, BE.CreateDate,
convert (DateTime,
(
convert(char(4), BE.FiscalYear) + 
case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) 
else convert(varchar(2), BE.ReportMonth) end +
case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2),  BE.ReportDay) 
else convert(varchar(2),  BE.ReportDay) end
)
) as ExpenseDate
from dbo.tmpBusinessExpense BE
where BE.ContractorId = @ContractorId
)
select * 
from FixedExpenseDate A
where A.ExpenseDate = ( select Max(B.ExpenseDate) 
from FixedExpenseDate B where A.ContractorId = B.ContractorId)
Here for each outer row, for each contractor subquery determines 
the max ExpenseDate. This query returns rows that have ExpenseDate equal to max ExpenseDate.

You check your query against the entire population

with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, 
ReportDay, Expense, CreateDate, ExpenseDate)
as (
select 
BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, 
BE.ReportDay, BE.Expense, BE.CreateDate,
convert (DateTime,
(
convert(char(4), BE.FiscalYear) + 
case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) 
else convert(varchar(2), BE.ReportMonth) end +
case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2),  BE.ReportDay) 
else convert(varchar(2),  BE.ReportDay) end
)
) as ExpenseDate
from dbo.tmpBusinessExpense BE
)
select * 
from FixedExpenseDate A
where A.ExpenseDate = ( select Max(B.ExpenseDate) 
                          from FixedExpenseDate B 
                          where A.ContractorId = B.ContractorId)

You got an error message:

Msg 242, Level 16, State 3, Line 1
The conversion of a char data type to a datetime data type 
resulted in an out-of-range datetime value.

How to find a row with "out-of-range datetime value"? You use isdate function:

select top 3
BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, BE.ReportDay, BE.Expense, BE.CreateDate,
isdate(
(
convert(varchar(4), BE.FiscalYear) + 
case when len(convert(varchar(2), BE.ReportMonth)) = 1 then '0' + convert(varchar(2), BE.ReportMonth) 
else convert(varchar(2), BE.ReportMonth) end +
case when len(rtrim(convert(char(2),  BE.ReportDay))) = 1 then '0' + rtrim(convert(char(2),  BE.ReportDay)) 
else rtrim(convert(char(2),  BE.ReportDay)) end
)
) as IsDateIndicator
from dbo.tmpBusinessExpense BE
order by IsDateIndicator

The ReportDay = 29 and in 2009 there were only 28 days in February. You fixed this value:

update dbo.tmpBusinessExpense
set ReportDay = 28
where 
ContractorId = 238 
and FiscalYear = 2009
and Quater = 3
and ReportMonth = 2
and ReportWeek = 6
and ReportDay = 29

You check your query second time:

with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, 
ReportDay, Expense, CreateDate, ExpenseDate)
as (
select 
BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, 
BE.ReportDay, BE.Expense, BE.CreateDate,
convert (DateTime,
(
convert(char(4), BE.FiscalYear) + 
case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) 
else convert(varchar(2), BE.ReportMonth) end +
case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2),  BE.ReportDay) 
else convert(varchar(2),  BE.ReportDay) end
)
) as ExpenseDate
from dbo.tmpBusinessExpense BE
)
select * 
from FixedExpenseDate A
where A.ExpenseDate = ( select Max(B.ExpenseDate) 
                          from FixedExpenseDate B 
                         where A.ContractorId = B.ContractorId)

Now you get rid of the extra columns and format your values in the report:

with FixedExpenseDate(ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, 
ReportDay, Expense, CreateDate, ExpenseDate)
as (
select 
BE.ContractorId, BE.FiscalYear, BE.Quater, BE.ReportMonth, BE.ReportWeek, 
BE.ReportDay, BE.Expense, BE.CreateDate,
convert (DateTime,
(
convert(char(4), BE.FiscalYear) + 
case when len(BE.ReportMonth) = 1 then '0' + convert(varchar(2), BE.ReportMonth) 
else convert(varchar(2), BE.ReportMonth) end +
case when len(BE.ReportDay) = 1 then '0' + convert(varchar(2),  BE.ReportDay) 
else convert(varchar(2),  BE.ReportDay) end
)
) as ExpenseDate
from dbo.tmpBusinessExpense BE
)
select C.ContractorName, convert(varchar, A.ExpenseDate, 101) as LastExpenseDate,  
                                 '$'+ convert(varchar, Expense, 1) as Expense
from FixedExpenseDate A inner join dbo.tmpContractor C
on A.ContractorId = C.ContractorId
where A.ExpenseDate = ( select Max(B.ExpenseDate) 
                          from FixedExpenseDate B 
                         where A.ContractorId = B.ContractorId)
order by C.ContractorName

The result is:

ContractorNameLastExpenseDateExpense
ALS Revices02/04/2009$927,428.00
Entrance Manufacturing12/14/2009$34.00
R & P suns10/07/2009$478,464.00

You ran your query against production tables and you got slightly different result:

ContractorNameLastExpenseDateExpense
ALS Revices02/04/2009$927,428.00
Entrance Manufacturing12/14/2009$34.00
Entrance Manufacturing12/14/2009$834.00
Entrance Manufacturing12/14/2009$3,834.00
Entrance Manufacturing12/14/2009$29,568,834.00
R & P suns10/07/2009$478,464.00

Obviously, while you were working on your report at least three records were entered for Contractor Entrance Manufacturing. To replicate this problem in your test tables you entered these rows:

insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, 
ReportWeek, ReportDay, Expense, CreateDate)
values (608257, 2009, 4, 12, 51, 14, 834, '20090812')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, 
ReportWeek, ReportDay, Expense, CreateDate)
values (608257, 2009, 4, 12, 51, 14, 3834, '20090812')
insert into dbo.tmpBusinessExpense (ContractorId, FiscalYear, Quater, ReportMonth, 
ReportWeek, ReportDay, Expense, CreateDate)
values (608257, 2009, 4, 12, 51, 14, 29568834, '20090812')

You decide to use RowId that uniquely identifies each row in table as part of ExpenseDate column to make ExpenseDate unique. You check the convert function:

SELECT [Datetime] = CONVERT(datetime,'2010-02-28T01:07:34.444', 126)
union all
SELECT [Datetime] = CONVERT(datetime,'2010-02-28T01:07:34.443', 126)
union all
SELECT [Datetime] = CONVERT(datetime,'2010-02-28 12:07:34.442', 121)
union all
SELECT [Datetime] = CAST('2010-02-28 12:07:34.442' AS datetime)
union all
SELECT [Datetime] = CAST('2010-02-28 12:07:34.443' AS datetime)

To your surprise all five rows return the same result '2010-02-28 01:07:34.443'. You think that you can use identity column with seed = 3 to avoid this issue. You check it, but convert function is not reliable:

SELECT [Datetime] = CONVERT(datetime,'2010-02-28T01:07:34.991', 126)
union all
SELECT [Datetime] = CONVERT(datetime,'2010-02-28T01:07:34.994', 126)
union all
SELECT [Datetime] = CONVERT(datetime,'2010-02-28T12:07:34.997', 126)
union all
SELECT [Datetime] = CAST('2010-02-28 12:07:34.988' AS datetime)
union all
SELECT [Datetime] = CAST('2010-02-28 12:07:34.984' AS datetime)

And minutes and seconds part of date can not exceed 59.

You review your query and decide that if multiple expenses where entered the same date (Tiebreaker) the max expense becomes the last one and it has max(RowId) for each contractor. You modify your query and it produce a correct result.

with FixedExpenseDate(RowId, ContractorId, FiscalYear, Quater, ReportMonth, 
                      ReportWeek, ReportDay, Expense)
as (
  select ROW_NUMBER() OVER (order by ContractorId, FiscalYear, Quater, 
   ReportMonth, ReportWeek, ReportDay) AS RowId,
ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense
from dbo.tmpBusinessExpense
)
select C.ContractorName, 
convert(varchar, 
convert (DateTime,
(
convert(char(4), A.FiscalYear) + '-' +
case when len(A.ReportMonth) = 1 then '0' + convert(char(1), A.ReportMonth) 
else convert(char(2), A.ReportMonth) end  + '-' +
case when len(A.ReportDay) = 1 then '0' + convert(char(1),  A.ReportDay) 
else convert(char(2),  A.ReportDay) end
) 
), 101) as LastExpenseDate,   '$'+ convert(varchar, Expense, 1) as Expense
from FixedExpenseDate A inner join dbo.tmpContractor C
on A.ContractorId = C.ContractorId
where A.RowId = ( select Max(B.RowId) 
                   from FixedExpenseDate B 
                  where A.ContractorId = B.ContractorId)
order by C.ContractorName

You know that in correlated subquery the inner query should execute for each row from outer query. How do you get the same result without correlated subquery? You find a tip "Get Unique from Duplicates". It was written for Oracle, and you can not use this idea in SQL Server 2000, but it should work for SQL Server 2005.

with FixedExpenseDate(RowId, ContractorId, FiscalYear, Quater, ReportMonth, 
                      ReportWeek, ReportDay, Expense)
as (
  select ROW_NUMBER() OVER (PARTITION BY ContractorId order by FiscalYear desc, Quater desc, 
   ReportMonth desc, ReportWeek desc, ReportDay desc, Expense desc ) AS RowId,
ContractorId, FiscalYear, Quater, ReportMonth, ReportWeek, ReportDay, Expense
from dbo.tmpBusinessExpense
)
select C.ContractorName, 
convert(varchar, 
convert (DateTime,
(
convert(char(4), A.FiscalYear) + '-' +
case when len(A.ReportMonth) = 1 then '0' + convert(char(1), A.ReportMonth) 
else convert(char(2), A.ReportMonth) end  + '-' +
case when len(A.ReportDay) = 1 then '0' + convert(char(1),  A.ReportDay) 
else convert(char(2),  A.ReportDay) end
) 
), 101) as LastExpenseDate,   '$'+ convert(varchar, Expense, 1) as Expense
from FixedExpenseDate A inner join dbo.tmpContractor C
on A.ContractorId = C.ContractorId
where A.RowId = 1
order by C.ContractorName

You send the report and you go home.

References:

http://www.sqlservercentral.com/

Jeff Smith's SQL Server Blog at http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx

Credits:

my wife, my kids, my teacher Ilya Petrenko

Rate

2.04 (52)

You rated this post out of 5. Change rating

Share

Share

Rate

2.04 (52)

You rated this post out of 5. Change rating