SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


List out rows from a duplicate record set that has one or more odd values


List out rows from a duplicate record set that has one or more odd values

Author
Message
kthakon
kthakon
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 158
Thank you!
Wt.
Sergiy
Sergiy
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26348 Visits: 12488
I believe Sean's solution has an extra unnecessary step.

If to use the filtering subquery in the JOIN it may provide sorting criteria as well.
Then CTE query may be excluded:

SELECT * 
FROM @mySampleTable s
INNER JOIN (SELECT CaseID, MAX(CreatedDate) LatestUpdate
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID)) fs ON s.CaseID = fs.CaseID
ORDER BY LatestUpdate DESC, CreatedDate DESC


Sergiy
Sergiy
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26348 Visits: 12488
And of course, it must say:
SELECT s.* 


to be precise. :-)
Eugene Elutin
Eugene Elutin
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12972 Visits: 5478
geoff5 (5/21/2013)
Sean,

I'm not an expert at execution plans, so I can't speak definitively on which approach is more efficient, but intuitively it would seem to me that referencing the raw data table only once, as my solution does (in the CTE), is going to be a better choice than referencing it three times, as your sorted solution does based on Eugene's "simpler" approach (once in the CTE, once in the main query, and once in the WHERE clause subquery).

In addition, Eugene cited the window functions as the problem he solved by putting the StudentID comparison into a subquery in the WHERE clause, but your sorting solution puts a window function back into a CTE. So all things considered, do you have reason to believe that your last recommendation is a more efficient query solution than my earlier suggestion?


1. Number of time the same table is mentioned in the query has no direct impact on performance. As SQL optimizer will decide how many times table/index should be scanned/seeked.
In case of using windowed function, SQL will need to perform lazy table spool operation, which is not the best on in terms of performance. So, even so you have used table only once in your query, SQL will need to scan it once and perform table spool twice (to calculate two different ROW_NUMBERs). For the query which does simple GROUP BY, SQL will need just to scan table twice. And it's only based on the given example. I guess in OP real table CaseId is most likely to be indexed. In this case, my query will only have one table scan and one index seek. Query with ROW_NUMBER will still do the same: one scan, two lazy table spools.

2. IN or INNER JOIN?
Until you wanted to sort by CaseId with highest CreatedDate, you would use IN as it would win slightly over INNER JOIN in this case, as we wouldn't need anything to take from the table (actually sub-query) which we JOIN to. Here is a very good article from Gail Shaw about it: http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/.
However, as soon as you need this particular order, you need to get calculated MAX (CreatedDate). Therefore - INNER JOIN.
So Sergiy's query does exactly that. However I would also enforce order by CaseIs, just in case there are different CaseId's with the same MAX(CreatedDate):

SELECT s.* 
FROM @mySampleTable s
INNER JOIN (SELECT CaseID, MAX(CreatedDate) LatestUpdate
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID)) fs ON s.CaseID = fs.CaseID
ORDER BY LatestUpdate DESC,
CaseID,
CreatedDate DESC -- here it's depends how you want to sort records for the same CaseId



_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Sergiy
Sergiy
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26348 Visits: 12488
Eugene Elutin (5/22/2013)

[quote]geoff5 (5/21/2013)
However I would also enforce order by CaseIs, just in case there are different CaseId's with the same MAX(CreatedDate)

Spot on!
Bhaskar.Shetty
Bhaskar.Shetty
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1156 Visits: 509
Try the below query..

SELECT *
FROM @mySampleTable s
OUTER APPLY (SELECT TOP 1 FROM @mySampleTable g WHERE s.CaseID = g.CaseID
AND s.StudentId = g.StudentID ORDER BY CreatedDate DESC) h
Eugene Elutin
Eugene Elutin
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12972 Visits: 5478
shettybhas (5/22/2013)
Try the below query..

SELECT *
FROM @mySampleTable s
OUTER APPLY (SELECT TOP 1 CaseID FROM @mySampleTable g WHERE s.CaseID = g.CaseID
ORDER BY CreatedDate DESC) h

This will give you lastest modified CaseID, and outerapply make qurery runs faster.


There is nothing in this query enforces order of records returned from @mySampleTable. So, records will be returned in random order.
Also, it doesn't perform required filtering. It will return all rows in a random order.
Basically the above query is exactly the same as simple:

SELECT *  FROM @mySampleTable



Some redundant OUTER APPLY - nothing else.

Actually, have you compared its execution plan with INNER JOIN version? I'm in great doubt that it will be any faster...

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Bhaskar.Shetty
Bhaskar.Shetty
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1156 Visits: 509
Sorry.. I didnt read the post completely, so here is the modified query,

SET STATISTICS PROFILE ON
SELECT aa.* FROM @mySampleTable aa
INNER JOIN (SELECT a.CaseId, a.latestUpd FROM (SELECT CASEID, COUNT(*) CCnt, MAX(CreatedDate) LatestUpd FROM @mySampleTable GROUP BY CASEID) A OUTER APPLY (SELECT TOP 1 CASEID, STUDENTID, COUNT(*) SCnt FROM @mySampleTable B WHERE A.cASEiD = B.cASEiD GROUP BY CASEID, STUDENTID) AS B WHERE A.CCNT <> b.scnt) bb on aa.CaseId = bb.CaseId
ORDER BY latestUpd desc, aa.CaseId

I tried to avoid using Max and Min function on Char column as this may back fire with huge data. and used Outer Apply instead with Counting No Of Rows against each CaseID and StudentId
geoff5
geoff5
Say Hey Kid
Say Hey Kid (692 reputation)Say Hey Kid (692 reputation)Say Hey Kid (692 reputation)Say Hey Kid (692 reputation)Say Hey Kid (692 reputation)Say Hey Kid (692 reputation)Say Hey Kid (692 reputation)Say Hey Kid (692 reputation)

Group: General Forum Members
Points: 692 Visits: 543
Eugene,

After reading your explanation of the reason why GROUP BY is more efficient than window functions, I can see the logic in your in-line view solution. I would still do it as a CTE, but that is merely a matter of refactoring what is essentially the same query logic. I do believe your last complete solution is the best approach.
Eugene Elutin
Eugene Elutin
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12972 Visits: 5478
geoff5 (5/22/2013)
Eugene,

After reading your explanation of the reason why GROUP BY is more efficient than window functions, I can see the logic in your in-line view solution. I would still do it as a CTE, but that is merely a matter of refactoring what is essentially the same query logic. I do believe your last complete solution is the best approach.


Actually, I don't have anything against CTE, and will always use it over sub-query. It makes query easier to read and, therefore, maintain. So, if you insist on using CTE in this case I would recommend that version:



;WITH cte
AS
(
SELECT CaseID, MAX(CreatedDate) LatestUpdate
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID)
)

SELECT s.*
FROM @mySampleTable AS s
JOIN cte AS fs ON fs.CaseID = s.CaseID
ORDER BY fs.LatestUpdate DESC,
s.CaseID,
s.CreatedDate DESC -- here it's depends how you want to sort records for the same CaseId




_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search