SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


a bit of a puzzle, can you fine people assist me?


a bit of a puzzle, can you fine people assist me?

Author
Message
sqlworkers
sqlworkers
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 9
I have two tables as defined below, and I need to return a list of jobs from table1 based on these conditions:
all the files associated with each job must have a success indicator in the most recent entry in the history table
if any of the associated files is not present or has a failed indicator, that job must be excluded.

Excluding all the failed files is easy, but how do I exclude a job when 2 out of 3 have succeeded?
the below data shows the mose recent entries for files 1, 2 3 are not all successful, so job1 should be excluded from the result set. I could of course loop through this and be done with it but I want to try to derive a result set instead of using a loop construct. Any thoughts?

CREATE TABLE1 (filename varchar(15), jobname varchar(15))

CREATE TABLE2 (Filename varchar(15), SuccessFailIndicator bit, checkdate datetime)

insert into table1
values('file1.txt', 'job1')

insert into table1
values('file2.txt', 'job1')

insert into table1
values('file3.txt', 'job1')

insert into table2
values('file1.txt', 1, '6/5/2013')

insert into table2
values('file2.txt', 0, '6/4/2013')

insert into table2
values('file3.txt', 1, '6/1/2013')

insert into table2
values('file1.txt', 0, '6/1/2013')

insert into table2
values('file2.txt', 0, '6/1/2013')

The design here is just to exemplify the logic, these are not the real data types or tables in use of course.
Mark Cowne
Mark Cowne
SSCertifiable
SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)

Group: General Forum Members
Points: 6525 Visits: 25572
Try this

WITH CTE1 AS (
SELECT Filename,SuccessFailIndicator,checkdate,
ROW_NUMBER() OVER(PARTITION BY Filename ORDER BY checkdate DESC) AS rn
FROM TABLE2)
SELECT t1.jobname
FROM TABLE1 t1
LEFT OUTER JOIN CTE1 c ON c.Filename = t1.Filename
AND c.SuccessFailIndicator = 1
AND c.rn=1
GROUP BY t1.jobname
HAVING COUNT(t1.Filename) = COUNT(c.Filename);



____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537




sqlworkers
sqlworkers
Forum Newbie
Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)Forum Newbie (6 reputation)

Group: General Forum Members
Points: 6 Visits: 9
This is exactly what I had except for the bit of genius at the end where you compared the counts...thank you very much kind internet stranger
dwain.c
dwain.c
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18045 Visits: 6431
Here's possibly another way:


SELECT jobname
FROM TABLE1 a
CROSS APPLY (
SELECT TOP 1 SuccessFailIndicator=1-SuccessFailIndicator
FROM TABLE2 b
WHERE a.[Filename] = b.[Filename]
ORDER BY checkdate DESC
) b
GROUP BY a.jobname
HAVING SUM(b.SuccessFailIndicator) = 0;




Depending on your indexing, one could do better than the other.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
dwain.c
dwain.c
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18045 Visits: 6431
I'm always interested in cases like this so with more time on my hands than I care to admit, I came up with the following test harness:


CREATE TABLE #TABLE1 ([filename] varchar(15), jobname varchar(15)
,PRIMARY KEY (jobname, [filename]));
CREATE TABLE #TABLE2 ([filename] varchar(15), SuccessFailIndicator bit, checkdate datetime
,PRIMARY KEY ([filename], checkdate));
GO
insert into #table1 values('file1.txt', 'job1');
insert into #table1 values('file2.txt', 'job1');
insert into #table1 values('file3.txt', 'job1');

WITH Tally (n) AS (
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #TABLE1
SELECT 'f' + CAST(b.n AS VARCHAR(2)) + RIGHT('00000000' + CAST(a.n AS VARCHAR(9)), 9) + '.txt'
,'j' + RIGHT('00000000' + CAST(a.n AS VARCHAR(9)), 9)
FROM Tally a
CROSS APPLY (SELECT TOP (1+ABS(CHECKSUM(NEWID()))%9) n FROM Tally) b;

insert into #TABLE2 values('file1.txt', 1, '6/5/2013');
insert into #TABLE2 values('file2.txt', 0, '6/4/2013');
insert into #TABLE2 values('file3.txt', 1, '6/1/2013');
insert into #TABLE2 values('file1.txt', 0, '6/1/2013');
insert into #TABLE2 values('file2.txt', 0, '6/1/2013');

WITH Tally (n) AS (
SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a)
INSERT INTO #TABLE2
SELECT [filename], ABS(CHECKSUM(NEWID()))%2
,DATEADD(day, -(n-1), '6/5/2013')
FROM #TABLE1 a
CROSS APPLY (SELECT TOP (1+ABS(CHECKSUM(NEWID()))%8) n FROM Tally) b
WHERE [filename] NOT IN ('file1.txt','file2.txt','file3.txt');

--SELECT * FROM #TABLE1;
--SELECT * FROM #TABLE2;

DECLARE @jobname VARCHAR(15);

DBCC FREEPROCCACHE;
PRINT 'MARK';
SET STATISTICS TIME ON;
WITH CTE1 AS (
SELECT Filename,SuccessFailIndicator,checkdate,
ROW_NUMBER() OVER(PARTITION BY Filename ORDER BY checkdate DESC) AS rn
FROM #TABLE2)
SELECT @jobname=t1.jobname
FROM #TABLE1 t1
LEFT OUTER JOIN CTE1 c ON c.Filename = t1.Filename
AND c.SuccessFailIndicator = 1
AND c.rn=1
GROUP BY t1.jobname
HAVING COUNT(t1.Filename) = COUNT(c.Filename)
ORDER BY t1.jobname;
SET STATISTICS TIME OFF;

DBCC FREEPROCCACHE;
PRINT 'DWAIN';
SET STATISTICS TIME ON;
SELECT @jobname=jobname
FROM #TABLE1 a
CROSS APPLY (
SELECT TOP 1 SuccessFailIndicator=1-SuccessFailIndicator
FROM #TABLE2 b
WHERE a.[Filename] = b.[Filename]
ORDER BY checkdate DESC
) b
GROUP BY a.jobname
HAVING SUM(b.SuccessFailIndicator) = 0
ORDER BY a.jobname;
SET STATISTICS TIME OFF;

GO
DROP TABLE #TABLE1;
DROP TABLE #TABLE2;




Which yielded these results:


DBCC execution completed. If DBCC printed error messages, contact your system administrator.

MARK
SQL Server Execution Times:
CPU time = 4352 ms, elapsed time = 2032 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DWAIN
SQL Server Execution Times:
CPU time = 1763 ms, elapsed time = 450 ms.



Note that parallelism played a role in both queries because CPU > Elapsed time.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Mark Cowne
Mark Cowne
SSCertifiable
SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)

Group: General Forum Members
Points: 6525 Visits: 25572
dwain.c (7/4/2013)
I'm always interested in cases like this so with more time on my hands than I care to admit, I came up with the following test harness:


CREATE TABLE #TABLE1 ([filename] varchar(15), jobname varchar(15)
,PRIMARY KEY (jobname, [filename]));
CREATE TABLE #TABLE2 ([filename] varchar(15), SuccessFailIndicator bit, checkdate datetime
,PRIMARY KEY ([filename], checkdate));
GO
insert into #table1 values('file1.txt', 'job1');
insert into #table1 values('file2.txt', 'job1');
insert into #table1 values('file3.txt', 'job1');

WITH Tally (n) AS (
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #TABLE1
SELECT 'f' + CAST(b.n AS VARCHAR(2)) + RIGHT('00000000' + CAST(a.n AS VARCHAR(9)), 9) + '.txt'
,'j' + RIGHT('00000000' + CAST(a.n AS VARCHAR(9)), 9)
FROM Tally a
CROSS APPLY (SELECT TOP (1+ABS(CHECKSUM(NEWID()))%9) n FROM Tally) b;

insert into #TABLE2 values('file1.txt', 1, '6/5/2013');
insert into #TABLE2 values('file2.txt', 0, '6/4/2013');
insert into #TABLE2 values('file3.txt', 1, '6/1/2013');
insert into #TABLE2 values('file1.txt', 0, '6/1/2013');
insert into #TABLE2 values('file2.txt', 0, '6/1/2013');

WITH Tally (n) AS (
SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a)
INSERT INTO #TABLE2
SELECT [filename], ABS(CHECKSUM(NEWID()))%2
,DATEADD(day, -(n-1), '6/5/2013')
FROM #TABLE1 a
CROSS APPLY (SELECT TOP (1+ABS(CHECKSUM(NEWID()))%8) n FROM Tally) b
WHERE [filename] NOT IN ('file1.txt','file2.txt','file3.txt');

--SELECT * FROM #TABLE1;
--SELECT * FROM #TABLE2;

DECLARE @jobname VARCHAR(15);

DBCC FREEPROCCACHE;
PRINT 'MARK';
SET STATISTICS TIME ON;
WITH CTE1 AS (
SELECT Filename,SuccessFailIndicator,checkdate,
ROW_NUMBER() OVER(PARTITION BY Filename ORDER BY checkdate DESC) AS rn
FROM #TABLE2)
SELECT @jobname=t1.jobname
FROM #TABLE1 t1
LEFT OUTER JOIN CTE1 c ON c.Filename = t1.Filename
AND c.SuccessFailIndicator = 1
AND c.rn=1
GROUP BY t1.jobname
HAVING COUNT(t1.Filename) = COUNT(c.Filename)
ORDER BY t1.jobname;
SET STATISTICS TIME OFF;

DBCC FREEPROCCACHE;
PRINT 'DWAIN';
SET STATISTICS TIME ON;
SELECT @jobname=jobname
FROM #TABLE1 a
CROSS APPLY (
SELECT TOP 1 SuccessFailIndicator=1-SuccessFailIndicator
FROM #TABLE2 b
WHERE a.[Filename] = b.[Filename]
ORDER BY checkdate DESC
) b
GROUP BY a.jobname
HAVING SUM(b.SuccessFailIndicator) = 0
ORDER BY a.jobname;
SET STATISTICS TIME OFF;

GO
DROP TABLE #TABLE1;
DROP TABLE #TABLE2;




Which yielded these results:


DBCC execution completed. If DBCC printed error messages, contact your system administrator.

MARK
SQL Server Execution Times:
CPU time = 4352 ms, elapsed time = 2032 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DWAIN
SQL Server Execution Times:
CPU time = 1763 ms, elapsed time = 450 ms.



Note that parallelism played a role in both queries because CPU > Elapsed time.



Interesting! However, the two queries don't give the same results. If you run this

delete from TABLE2 where Filename<>'file3.txt'



on the OPs sample data so that job1 only has one out of three successfully completed files, your query still reports the job as successful

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537




dwain.c
dwain.c
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18045 Visits: 6431
Mark-101232 (7/4/2013)

Interesting! However, the two queries don't give the same results. If you run this

delete from TABLE2 where Filename<>'file3.txt'



on the OPs sample data so that job1 only has one out of three successfully completed files, your query still reports the job as successful




Degenerative case huh? Sorry I missed that in the requirements. Minor fix though I think will get it:


SELECT jobname
FROM #TABLE1 a
OUTER APPLY (
SELECT TOP 1 SuccessFailIndicator=1-SuccessFailIndicator
FROM #TABLE2 b
WHERE a.[Filename] = b.[Filename]
ORDER BY checkdate DESC
) b
GROUP BY a.jobname
HAVING SUM(ISNULL(b.SuccessFailIndicator, 1)) = 0;




Test harness is now:


CREATE TABLE #TABLE1 ([filename] varchar(15), jobname varchar(15)
,PRIMARY KEY (jobname, [filename]));
CREATE TABLE #TABLE2 ([filename] varchar(15), SuccessFailIndicator bit, checkdate datetime
,PRIMARY KEY ([filename], checkdate));
GO
insert into #table1 values('file1.txt', 'job1');
insert into #table1 values('file2.txt', 'job1');
insert into #table1 values('file3.txt', 'job1');

WITH Tally (n) AS (
SELECT TOP 50000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #TABLE1
SELECT 'f' + CAST(b.n AS VARCHAR(2)) + RIGHT('00000000' + CAST(a.n AS VARCHAR(9)), 9) + '.txt'
,'j' + RIGHT('00000000' + CAST(a.n AS VARCHAR(9)), 9)
FROM Tally a
CROSS APPLY (SELECT TOP (1+ABS(CHECKSUM(NEWID()))%9) n FROM Tally) b;

insert into #TABLE2 values('file1.txt', 1, '6/5/2013');
insert into #TABLE2 values('file2.txt', 0, '6/4/2013');
insert into #TABLE2 values('file3.txt', 1, '6/1/2013');
insert into #TABLE2 values('file1.txt', 0, '6/1/2013');
insert into #TABLE2 values('file2.txt', 0, '6/1/2013');

WITH Tally (n) AS (
SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a)
INSERT INTO #TABLE2
SELECT [filename], ABS(CHECKSUM(NEWID()))%2
,DATEADD(day, -(n-1), '6/5/2013')
FROM #TABLE1 a
CROSS APPLY (SELECT TOP (1+ABS(CHECKSUM(NEWID()))%8) n FROM Tally) b
WHERE [filename] NOT IN ('file1.txt','file2.txt','file3.txt');

--SELECT * FROM #TABLE1;
--SELECT * FROM #TABLE2;

DECLARE @jobname VARCHAR(15);

DBCC FREEPROCCACHE;
PRINT 'MARK';
SET STATISTICS TIME ON;
WITH CTE1 AS (
SELECT Filename,SuccessFailIndicator,checkdate,
ROW_NUMBER() OVER(PARTITION BY Filename ORDER BY checkdate DESC) AS rn
FROM #TABLE2)
SELECT @jobname=t1.jobname
--SELECT t1.jobname
FROM #TABLE1 t1
LEFT OUTER JOIN CTE1 c ON c.Filename = t1.Filename
AND c.SuccessFailIndicator = 1
AND c.rn=1
GROUP BY t1.jobname
HAVING COUNT(t1.Filename) = COUNT(c.Filename);
SET STATISTICS TIME OFF;

DBCC FREEPROCCACHE;
PRINT 'DWAIN';
SET STATISTICS TIME ON;
--SELECT jobname
SELECT @jobname=jobname
FROM #TABLE1 a
OUTER APPLY (
SELECT TOP 1 SuccessFailIndicator=1-SuccessFailIndicator
FROM #TABLE2 b
WHERE a.[Filename] = b.[Filename]
ORDER BY checkdate DESC
) b
GROUP BY a.jobname
HAVING SUM(ISNULL(b.SuccessFailIndicator, 1)) = 0;
SET STATISTICS TIME OFF;

GO
DROP TABLE #TABLE1;
DROP TABLE #TABLE2;




The results aren't much different than before.


DBCC execution completed. If DBCC printed error messages, contact your system administrator.

MARK
SQL Server Execution Times:
CPU time = 13993 ms, elapsed time = 6558 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DWAIN
SQL Server Execution Times:
CPU time = 7831 ms, elapsed time = 2105 ms.



The reason (at least by the execution plan on my machine) why the SELECT TOP 1 within APPLY is faster on my machine is because it is able to do a clustered index seek on TABLE2 (scan on TABLE1) while the JOIN does two clustered index scans.

Edit Corrected (removed) an ORDER BY I left in to check the results and bumped up the size of the test harness a bit.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search