Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

List out rows from a duplicate record set that has one or more odd values Expand / Collapse
Author
Message
Posted Tuesday, May 21, 2013 11:02 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, March 20, 2014 10:18 PM
Points: 2, Visits: 157
Thank you!
Wt.
Post #1455304
Posted Tuesday, May 21, 2013 11:53 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Sunday, April 13, 2014 9:51 PM
Points: 4,570, Visits: 8,315
I believe Sean's solution has an extra unnecessary step.

If to use the filtering subquery in the JOIN it may provide sorting criteria as well.
Then CTE query may be excluded:

SELECT * 
FROM @mySampleTable s
INNER JOIN (SELECT CaseID, MAX(CreatedDate) LatestUpdate
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID)) fs ON s.CaseID = fs.CaseID
ORDER BY LatestUpdate DESC, CreatedDate DESC

Post #1455314
Posted Tuesday, May 21, 2013 11:56 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Sunday, April 13, 2014 9:51 PM
Points: 4,570, Visits: 8,315
And of course, it must say:
SELECT s.* 

to be precise.
Post #1455316
Posted Wednesday, May 22, 2013 3:29 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, April 03, 2014 10:10 AM
Points: 2,792, Visits: 4,874

geoff5 (5/21/2013)
Sean,

I'm not an expert at execution plans, so I can't speak definitively on which approach is more efficient, but intuitively it would seem to me that referencing the raw data table only once, as my solution does (in the CTE), is going to be a better choice than referencing it three times, as your sorted solution does based on Eugene's "simpler" approach (once in the CTE, once in the main query, and once in the WHERE clause subquery).

In addition, Eugene cited the window functions as the problem he solved by putting the StudentID comparison into a subquery in the WHERE clause, but your sorting solution puts a window function back into a CTE. So all things considered, do you have reason to believe that your last recommendation is a more efficient query solution than my earlier suggestion?


1. Number of time the same table is mentioned in the query has no direct impact on performance. As SQL optimizer will decide how many times table/index should be scanned/seeked.
In case of using windowed function, SQL will need to perform lazy table spool operation, which is not the best on in terms of performance. So, even so you have used table only once in your query, SQL will need to scan it once and perform table spool twice (to calculate two different ROW_NUMBERs). For the query which does simple GROUP BY, SQL will need just to scan table twice. And it's only based on the given example. I guess in OP real table CaseId is most likely to be indexed. In this case, my query will only have one table scan and one index seek. Query with ROW_NUMBER will still do the same: one scan, two lazy table spools.

2. IN or INNER JOIN?
Until you wanted to sort by CaseId with highest CreatedDate, you would use IN as it would win slightly over INNER JOIN in this case, as we wouldn't need anything to take from the table (actually sub-query) which we JOIN to. Here is a very good article from Gail Shaw about it: http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/.
However, as soon as you need this particular order, you need to get calculated MAX (CreatedDate). Therefore - INNER JOIN.
So Sergiy's query does exactly that. However I would also enforce order by CaseIs, just in case there are different CaseId's with the same MAX(CreatedDate):

SELECT s.* 
FROM @mySampleTable s
INNER JOIN (SELECT CaseID, MAX(CreatedDate) LatestUpdate
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID)) fs ON s.CaseID = fs.CaseID
ORDER BY LatestUpdate DESC,
CaseID,
CreatedDate DESC -- here it's depends how you want to sort records for the same CaseId




_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1455384
Posted Wednesday, May 22, 2013 3:43 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Sunday, April 13, 2014 9:51 PM
Points: 4,570, Visits: 8,315
Eugene Elutin (5/22/2013)

[quote]geoff5 (5/21/2013)
However I would also enforce order by CaseIs, just in case there are different CaseId's with the same MAX(CreatedDate)

Spot on!
Post #1455391
Posted Wednesday, May 22, 2013 5:17 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, April 10, 2014 3:36 AM
Points: 406, Visits: 383
Try the below query..

SELECT *
FROM @mySampleTable s
OUTER APPLY (SELECT TOP 1 FROM @mySampleTable g WHERE s.CaseID = g.CaseID
AND s.StudentId = g.StudentID ORDER BY CreatedDate DESC) h

Post #1455430
Posted Wednesday, May 22, 2013 5:27 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, April 03, 2014 10:10 AM
Points: 2,792, Visits: 4,874
shettybhas (5/22/2013)
Try the below query..

SELECT *
FROM @mySampleTable s
OUTER APPLY (SELECT TOP 1 CaseID FROM @mySampleTable g WHERE s.CaseID = g.CaseID
ORDER BY CreatedDate DESC) h

This will give you lastest modified CaseID, and outerapply make qurery runs faster.


There is nothing in this query enforces order of records returned from @mySampleTable. So, records will be returned in random order.
Also, it doesn't perform required filtering. It will return all rows in a random order.
Basically the above query is exactly the same as simple:

SELECT *  FROM @mySampleTable

Some redundant OUTER APPLY - nothing else.

Actually, have you compared its execution plan with INNER JOIN version? I'm in great doubt that it will be any faster...


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1455432
Posted Wednesday, May 22, 2013 6:18 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, April 10, 2014 3:36 AM
Points: 406, Visits: 383
Sorry.. I didnt read the post completely, so here is the modified query,

SET STATISTICS PROFILE ON
SELECT aa.* FROM @mySampleTable aa
INNER JOIN (SELECT a.CaseId, a.latestUpd FROM (SELECT CASEID, COUNT(*) CCnt, MAX(CreatedDate) LatestUpd FROM @mySampleTable GROUP BY CASEID) A OUTER APPLY (SELECT TOP 1 CASEID, STUDENTID, COUNT(*) SCnt FROM @mySampleTable B WHERE A.cASEiD = B.cASEiD GROUP BY CASEID, STUDENTID) AS B WHERE A.CCNT <> b.scnt) bb on aa.CaseId = bb.CaseId
ORDER BY latestUpd desc, aa.CaseId

I tried to avoid using Max and Min function on Char column as this may back fire with huge data. and used Outer Apply instead with Counting No Of Rows against each CaseID and StudentId
Post #1455448
Posted Wednesday, May 22, 2013 6:44 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, March 28, 2014 1:02 PM
Points: 150, Visits: 445
Eugene,

After reading your explanation of the reason why GROUP BY is more efficient than window functions, I can see the logic in your in-line view solution. I would still do it as a CTE, but that is merely a matter of refactoring what is essentially the same query logic. I do believe your last complete solution is the best approach.
Post #1455465
Posted Wednesday, May 22, 2013 8:32 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, April 03, 2014 10:10 AM
Points: 2,792, Visits: 4,874
geoff5 (5/22/2013)
Eugene,

After reading your explanation of the reason why GROUP BY is more efficient than window functions, I can see the logic in your in-line view solution. I would still do it as a CTE, but that is merely a matter of refactoring what is essentially the same query logic. I do believe your last complete solution is the best approach.


Actually, I don't have anything against CTE, and will always use it over sub-query. It makes query easier to read and, therefore, maintain. So, if you insist on using CTE in this case I would recommend that version:


;WITH cte
AS
(
SELECT CaseID, MAX(CreatedDate) LatestUpdate
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID)
)

SELECT s.*
FROM @mySampleTable AS s
JOIN cte AS fs ON fs.CaseID = s.CaseID
ORDER BY fs.LatestUpdate DESC,
s.CaseID,
s.CreatedDate DESC -- here it's depends how you want to sort records for the same CaseId





_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1455552
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse