dwain.c (7/4/2013)
I'm always interested in cases like this so with more time on my hands than I care to admit, I came up with the following test harness:
CREATE TABLE #TABLE1 ([filename] varchar(15), jobname varchar(15)
,PRIMARY KEY (jobname, [filename]));
CREATE TABLE #TABLE2 ([filename] varchar(15), SuccessFailIndicator bit, checkdate datetime
,PRIMARY KEY ([filename], checkdate));
GO
insert into #table1 values('file1.txt', 'job1');
insert into #table1 values('file2.txt', 'job1');
insert into #table1 values('file3.txt', 'job1');
WITH Tally (n) AS (
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #TABLE1
SELECT 'f' + CAST(b.n AS VARCHAR(2)) + RIGHT('00000000' + CAST(a.n AS VARCHAR(9)), 9) + '.txt'
,'j' + RIGHT('00000000' + CAST(a.n AS VARCHAR(9)), 9)
FROM Tally a
CROSS APPLY (SELECT TOP (1+ABS(CHECKSUM(NEWID()))%9) n FROM Tally) b;
insert into #TABLE2 values('file1.txt', 1, '6/5/2013');
insert into #TABLE2 values('file2.txt', 0, '6/4/2013');
insert into #TABLE2 values('file3.txt', 1, '6/1/2013');
insert into #TABLE2 values('file1.txt', 0, '6/1/2013');
insert into #TABLE2 values('file2.txt', 0, '6/1/2013');
WITH Tally (n) AS (
SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a)
INSERT INTO #TABLE2
SELECT [filename], ABS(CHECKSUM(NEWID()))%2
,DATEADD(day, -(n-1), '6/5/2013')
FROM #TABLE1 a
CROSS APPLY (SELECT TOP (1+ABS(CHECKSUM(NEWID()))%8) n FROM Tally) b
WHERE [filename] NOT IN ('file1.txt','file2.txt','file3.txt');
--SELECT * FROM #TABLE1;
--SELECT * FROM #TABLE2;
DECLARE @jobname VARCHAR(15);
DBCC FREEPROCCACHE;
PRINT 'MARK';
SET STATISTICS TIME ON;
WITH CTE1 AS (
SELECT Filename,SuccessFailIndicator,checkdate,
ROW_NUMBER() OVER(PARTITION BY Filename ORDER BY checkdate DESC) AS rn
FROM #TABLE2)
SELECT @jobname=t1.jobname
FROM #TABLE1 t1
LEFT OUTER JOIN CTE1 c ON c.Filename = t1.Filename
AND c.SuccessFailIndicator = 1
AND c.rn=1
GROUP BY t1.jobname
HAVING COUNT(t1.Filename) = COUNT(c.Filename)
ORDER BY t1.jobname;
SET STATISTICS TIME OFF;
DBCC FREEPROCCACHE;
PRINT 'DWAIN';
SET STATISTICS TIME ON;
SELECT @jobname=jobname
FROM #TABLE1 a
CROSS APPLY (
SELECT TOP 1 SuccessFailIndicator=1-SuccessFailIndicator
FROM #TABLE2 b
WHERE a.[Filename] = b.[Filename]
ORDER BY checkdate DESC
) b
GROUP BY a.jobname
HAVING SUM(b.SuccessFailIndicator) = 0
ORDER BY a.jobname;
SET STATISTICS TIME OFF;
GO
DROP TABLE #TABLE1;
DROP TABLE #TABLE2;
Which yielded these results:
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
MARK
SQL Server Execution Times:
CPU time = 4352 ms, elapsed time = 2032 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DWAIN
SQL Server Execution Times:
CPU time = 1763 ms, elapsed time = 450 ms.
Note that parallelism played a role in both queries because CPU > Elapsed time.
Interesting! However, the two queries don't give the same results. If you run this
delete from TABLE2 where Filename<>'file3.txt'
on the OPs sample data so that job1 only has one out of three successfully completed files, your query still reports the job as successful
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537