SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


List out rows from a duplicate record set that has one or more odd values


List out rows from a duplicate record set that has one or more odd values

Author
Message
SQLCurious
SQLCurious
SSC Veteran
SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)

Group: General Forum Members
Points: 209 Visits: 450
I have a table with data that looks as below...



Each CaseID has multiple records in the table. Each of these rows is expected to have same data in all columns except "CreatedDate". However, in certain cases the StudentID may be different from the other row(s) of the same CaseID. I need to report data for all such cases. Additionally, the output needs to be arranged in the order of caseID with highest created date. i.e., in the result set all records belonging to the CaseID which was modified most recently should appear at the top.

Here is the sample of how the desired result set should look like...



Following is the code snippet to create sample data


DECLARE @mySampleTable TABLE (
CaseID INT,
StudentID VARCHAR (20),
StudentSocial VARCHAR (11),
StudentName VARCHAR (50),
CreatedDate DateTime2
)

INSERT @mySampleTable (CaseID, StudentID, StudentSocial, StudentName, CreatedDate)
VALUES (1001, 'FTS-2134', '111-22-3333', 'Roger Federer', '5/12/2013'),
(1001, 'FTS-2314', '111-22-3333', 'Roger Federer', '5/08/2013'),
(1002, 'PTS-5769', '222-33-4444', 'Maria Sharapova', '5/14/2013'),
(1002, 'PTS-5769', '222-33-4444', 'Maria Sharapova', '5/2/2013'),
(1003, 'FTS-8764', '333-44-5555', 'Serena Williams', '5/17/2013'),
(1003, 'FTS-8764', '333-44-5555', 'Serena Williams', '5/06/2013'),
(1003, 'FTS-7864', '333-44-5555', 'Serena Williams', '4/30/2013'),
(1004, 'FTS-9132', '444-55-6666', 'Novak Djokovic', '5/10/2013'),
(1004, 'FTS-9312', '444-55-6666', 'Novak Djokovic', '4/27/2013'),
(1005, 'PTS-4722', '555-66-7777', 'Novak Djokovic', '5/19/2013'),
(1005, 'PTS-4722', '555-66-7777', 'Novak Djokovic', '5/12/2013')

select * from @mySampleTable




For now, this is what I did...

SELECT DISTINCT a.CaseID, a.StudentID, b.StudentID, a.StudentSocial, a.StudentName, a.CreatedDate
FROM @mySampleTable a
INNER JOIN @mySampleTable b ON a.CaseID = b.CaseID AND a.StudentSocial = b.StudentSocial AND a.StudentID <> b.StudentID



It doesn't do the sorting, but seems to do the rest. I am not sure if this is the best way to achieve what I am looking for though. My actual production tables have fairly large amount of data. I would appreciate your help and inputs with this.

Thanks in advance...
- Rex
Attachments
Original_Data.jpg (192 views, 83.00 KB)
Desired_Output.jpg (191 views, 53.00 KB)
geoff5
geoff5
SSC Veteran
SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)

Group: General Forum Members
Points: 250 Visits: 543
My solution avoids joining the table to itself by using a couple of window functions in a CTE named "flagged_data." It adds a "StudentID_Problem" column (1 for a problem, 0 for no problem). Because COUNT(DISTINCT) does not work in a window function, I compared MIN(StudentID) to MAX(StudentID) to see if they are a match. If they are not a match, there is a problem because there are at least two StudentID values for the given CaseID.

Because this flag needs to be a filter, and because window functions are not allowed in the WHERE clause, there has to be an intermediary step in a CTE. AFter the CTE, though, it is a simple SELECT statement ordered by a LastEntryDate (also calculated with a window function in the CTE), CaseID, and CreatedDate.

with
mySampleTable (CaseID, StudentID, StudentSocial, StudentName, CreatedDate) as
(select 1001, 'FTS-2134', '111-22-3333', 'Roger Federer', '5/12/2013' union all
select 1001, 'FTS-2314', '111-22-3333', 'Roger Federer', '5/08/2013'union all
select 1002, 'PTS-5769', '222-33-4444', 'Maria Sharapova', '5/14/2013'union all
select 1002, 'PTS-5769', '222-33-4444', 'Maria Sharapova', '5/2/2013'union all
select 1003, 'FTS-8764', '333-44-5555', 'Serena Williams', '5/17/2013'union all
select 1003, 'FTS-8764', '333-44-5555', 'Serena Williams', '5/06/2013'union all
select 1003, 'FTS-7864', '333-44-5555', 'Serena Williams', '4/30/2013'union all
select 1004, 'FTS-9132', '444-55-6666', 'Novak Djokovic', '5/10/2013'union all
select 1004, 'FTS-9312', '444-55-6666', 'Novak Djokovic', '4/27/2013'union all
select 1005, 'PTS-4722', '555-66-7777', 'Novak Djokovic', '5/19/2013'union all
select 1005, 'PTS-4722', '555-66-7777', 'Novak Djokovic', '5/12/2013'),

flagged_data as
(SELECT
CaseID,
StudentID,
StudentSocial,
StudentName,
CreatedDate,
StudentID_Problem = Case when MIN(StudentID) over (PARTITION by CaseID) <> MAX(StudentID) over (partition by CaseID) then 1 else 0 end,
LastEntryDate = MAX(CreatedDate) over (partition by CaseID)
FROM
mySampleTable)

select
CaseID,
StudentID,
StudentSocial,
StudentName,
CreatedDate
from
flagged_data
where
StudentID_Problem = 1
order by
LastEntryDate desc, CaseID, CreatedDate desc


Eugene Elutin
Eugene Elutin
SSCertifiable
SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)

Group: General Forum Members
Points: 5060 Visits: 5478
It can be done without using windowed function (so, will have better performance):


SELECT *
FROM @mySampleTable s
WHERE s.CaseID IN (SELECT CaseID
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID))



_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Sean Lange
Sean Lange
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26374 Visits: 17557
The next thing you should do is to stop storing SSN in clear text. Get that stuff encrypted!!!!

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
SQLCurious
SQLCurious
SSC Veteran
SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)

Group: General Forum Members
Points: 209 Visits: 450
Geoff5,

Thanks a lot for your suggested solution. I appreciate that.

Eugene Elutin,

Your solution looks simpler, but it is not sorting the end result the way I want it. Any suggestions on that?

Sean Lange,

Thanks for the encryption suggestion. This was just a sample data set I created to simulate my requirement. Actual data does not have any data as confidential as SSN. I definitely appreciate your suggestion and keep that in mind when I deal with any sensitive data.


- Rex
Sean Lange
Sean Lange
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26374 Visits: 17557
RexHelios (5/21/2013)


Your solution looks simpler, but it is not sorting the end result the way I want it. Any suggestions on that?



His query does not have an order by. If you want it to be sorted in some certain you need to add an order by.

How do you want it sorted?

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Sean Lange
Sean Lange
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26374 Visits: 17557
Ahh I see how you want it sorted. You could modify Eugene's solution slightly like this.


with MySort as
(
select CaseID,
ROW_NUMBER() over(order by Max(CreatedDate) desc) as SortOrder
from @mySampleTable
group by CaseID
)

SELECT *
FROM @mySampleTable s
join MySort ms on ms.CaseID = s.CaseID
WHERE s.CaseID IN (SELECT CaseID
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID))
order by SortOrder, CreatedDate



_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
geoff5
geoff5
SSC Veteran
SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)SSC Veteran (250 reputation)

Group: General Forum Members
Points: 250 Visits: 543
Sean,

I'm not an expert at execution plans, so I can't speak definitively on which approach is more efficient, but intuitively it would seem to me that referencing the raw data table only once, as my solution does (in the CTE), is going to be a better choice than referencing it three times, as your sorted solution does based on Eugene's "simpler" approach (once in the CTE, once in the main query, and once in the WHERE clause subquery).

In addition, Eugene cited the window functions as the problem he solved by putting the StudentID comparison into a subquery in the WHERE clause, but your sorting solution puts a window function back into a CTE. So all things considered, do you have reason to believe that your last recommendation is a more efficient query solution than my earlier suggestion?
Sean Lange
Sean Lange
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26374 Visits: 17557
geoff5 (5/21/2013)
Sean,

I'm not an expert at execution plans, so I can't speak definitively on which approach is more efficient, but intuitively it would seem to me that referencing the raw data table only once, as my solution does (in the CTE), is going to be a better choice than referencing it three times, as your sorted solution does based on Eugene's "simpler" approach (once in the CTE, once in the main query, and once in the WHERE clause subquery).

In addition, Eugene cited the window functions as the problem he solved by putting the StudentID comparison into a subquery in the WHERE clause, but your sorting solution puts a window function back into a CTE. So all things considered, do you have reason to believe that your last recommendation is a more efficient query solution than my earlier suggestion?


Not at all. I was merely posting it as an option. Once you add the sorting that I put in there it is most likely going to be slower.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
SQLCurious
SQLCurious
SSC Veteran
SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)

Group: General Forum Members
Points: 209 Visits: 450
Thank you all!

- Rex
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search