Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

List out rows from a duplicate record set that has one or more odd values Expand / Collapse
Author
Message
Posted Monday, May 20, 2013 3:44 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, September 9, 2014 12:21 PM
Points: 76, Visits: 291
I have a table with data that looks as below...



Each CaseID has multiple records in the table. Each of these rows is expected to have same data in all columns except "CreatedDate". However, in certain cases the StudentID may be different from the other row(s) of the same CaseID. I need to report data for all such cases. Additionally, the output needs to be arranged in the order of caseID with highest created date. i.e., in the result set all records belonging to the CaseID which was modified most recently should appear at the top.

Here is the sample of how the desired result set should look like...



Following is the code snippet to create sample data

DECLARE @mySampleTable TABLE (
CaseID INT,
StudentID VARCHAR (20),
StudentSocial VARCHAR (11),
StudentName VARCHAR (50),
CreatedDate DateTime2
)

INSERT @mySampleTable (CaseID, StudentID, StudentSocial, StudentName, CreatedDate)
VALUES (1001, 'FTS-2134', '111-22-3333', 'Roger Federer', '5/12/2013'),
(1001, 'FTS-2314', '111-22-3333', 'Roger Federer', '5/08/2013'),
(1002, 'PTS-5769', '222-33-4444', 'Maria Sharapova', '5/14/2013'),
(1002, 'PTS-5769', '222-33-4444', 'Maria Sharapova', '5/2/2013'),
(1003, 'FTS-8764', '333-44-5555', 'Serena Williams', '5/17/2013'),
(1003, 'FTS-8764', '333-44-5555', 'Serena Williams', '5/06/2013'),
(1003, 'FTS-7864', '333-44-5555', 'Serena Williams', '4/30/2013'),
(1004, 'FTS-9132', '444-55-6666', 'Novak Djokovic', '5/10/2013'),
(1004, 'FTS-9312', '444-55-6666', 'Novak Djokovic', '4/27/2013'),
(1005, 'PTS-4722', '555-66-7777', 'Novak Djokovic', '5/19/2013'),
(1005, 'PTS-4722', '555-66-7777', 'Novak Djokovic', '5/12/2013')

select * from @mySampleTable


For now, this is what I did...
SELECT	DISTINCT a.CaseID, a.StudentID, b.StudentID, a.StudentSocial, a.StudentName, a.CreatedDate
FROM @mySampleTable a
INNER JOIN @mySampleTable b ON a.CaseID = b.CaseID AND a.StudentSocial = b.StudentSocial AND a.StudentID <> b.StudentID

It doesn't do the sorting, but seems to do the rest. I am not sure if this is the best way to achieve what I am looking for though. My actual production tables have fairly large amount of data. I would appreciate your help and inputs with this.

Thanks in advance...
- Rex


  Post Attachments 
Original_Data.jpg (188 views, 83.87 KB)
Desired_Output.jpg (186 views, 53.93 KB)
Post #1454767
Posted Monday, May 20, 2013 7:40 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, August 21, 2014 7:49 PM
Points: 171, Visits: 501
My solution avoids joining the table to itself by using a couple of window functions in a CTE named "flagged_data." It adds a "StudentID_Problem" column (1 for a problem, 0 for no problem). Because COUNT(DISTINCT) does not work in a window function, I compared MIN(StudentID) to MAX(StudentID) to see if they are a match. If they are not a match, there is a problem because there are at least two StudentID values for the given CaseID.

Because this flag needs to be a filter, and because window functions are not allowed in the WHERE clause, there has to be an intermediary step in a CTE. AFter the CTE, though, it is a simple SELECT statement ordered by a LastEntryDate (also calculated with a window function in the CTE), CaseID, and CreatedDate.

with
mySampleTable (CaseID, StudentID, StudentSocial, StudentName, CreatedDate) as
(select 1001, 'FTS-2134', '111-22-3333', 'Roger Federer', '5/12/2013' union all
select 1001, 'FTS-2314', '111-22-3333', 'Roger Federer', '5/08/2013'union all
select 1002, 'PTS-5769', '222-33-4444', 'Maria Sharapova', '5/14/2013'union all
select 1002, 'PTS-5769', '222-33-4444', 'Maria Sharapova', '5/2/2013'union all
select 1003, 'FTS-8764', '333-44-5555', 'Serena Williams', '5/17/2013'union all
select 1003, 'FTS-8764', '333-44-5555', 'Serena Williams', '5/06/2013'union all
select 1003, 'FTS-7864', '333-44-5555', 'Serena Williams', '4/30/2013'union all
select 1004, 'FTS-9132', '444-55-6666', 'Novak Djokovic', '5/10/2013'union all
select 1004, 'FTS-9312', '444-55-6666', 'Novak Djokovic', '4/27/2013'union all
select 1005, 'PTS-4722', '555-66-7777', 'Novak Djokovic', '5/19/2013'union all
select 1005, 'PTS-4722', '555-66-7777', 'Novak Djokovic', '5/12/2013'),

flagged_data as
(SELECT
CaseID,
StudentID,
StudentSocial,
StudentName,
CreatedDate,
StudentID_Problem = Case when MIN(StudentID) over (PARTITION by CaseID) <> MAX(StudentID) over (partition by CaseID) then 1 else 0 end,
LastEntryDate = MAX(CreatedDate) over (partition by CaseID)
FROM
mySampleTable)

select
CaseID,
StudentID,
StudentSocial,
StudentName,
CreatedDate
from
flagged_data
where
StudentID_Problem = 1
order by
LastEntryDate desc, CaseID, CreatedDate desc

Post #1454785
Posted Tuesday, May 21, 2013 2:46 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, October 21, 2014 8:55 AM
Points: 2,873, Visits: 5,185
It can be done without using windowed function (so, will have better performance):

SELECT * 
FROM @mySampleTable s
WHERE s.CaseID IN (SELECT CaseID
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID))





_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1454880
Posted Tuesday, May 21, 2013 8:15 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:04 PM
Points: 13,086, Visits: 12,553
The next thing you should do is to stop storing SSN in clear text. Get that stuff encrypted!!!!

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1455037
Posted Tuesday, May 21, 2013 11:44 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, September 9, 2014 12:21 PM
Points: 76, Visits: 291
Geoff5,

Thanks a lot for your suggested solution. I appreciate that.

Eugene Elutin,

Your solution looks simpler, but it is not sorting the end result the way I want it. Any suggestions on that?

Sean Lange,

Thanks for the encryption suggestion. This was just a sample data set I created to simulate my requirement. Actual data does not have any data as confidential as SSN. I definitely appreciate your suggestion and keep that in mind when I deal with any sensitive data.


- Rex
Post #1455143
Posted Tuesday, May 21, 2013 12:55 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:04 PM
Points: 13,086, Visits: 12,553
RexHelios (5/21/2013)


Your solution looks simpler, but it is not sorting the end result the way I want it. Any suggestions on that?



His query does not have an order by. If you want it to be sorted in some certain you need to add an order by.

How do you want it sorted?


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1455173
Posted Tuesday, May 21, 2013 1:00 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:04 PM
Points: 13,086, Visits: 12,553
Ahh I see how you want it sorted. You could modify Eugene's solution slightly like this.

with MySort as
(
select CaseID,
ROW_NUMBER() over(order by Max(CreatedDate) desc) as SortOrder
from @mySampleTable
group by CaseID
)

SELECT *
FROM @mySampleTable s
join MySort ms on ms.CaseID = s.CaseID
WHERE s.CaseID IN (SELECT CaseID
FROM @mySampleTable
GROUP BY CaseID
HAVING MAX(StudentID) != MIN(StudentID))
order by SortOrder, CreatedDate



_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1455176
Posted Tuesday, May 21, 2013 1:28 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, August 21, 2014 7:49 PM
Points: 171, Visits: 501
Sean,

I'm not an expert at execution plans, so I can't speak definitively on which approach is more efficient, but intuitively it would seem to me that referencing the raw data table only once, as my solution does (in the CTE), is going to be a better choice than referencing it three times, as your sorted solution does based on Eugene's "simpler" approach (once in the CTE, once in the main query, and once in the WHERE clause subquery).

In addition, Eugene cited the window functions as the problem he solved by putting the StudentID comparison into a subquery in the WHERE clause, but your sorting solution puts a window function back into a CTE. So all things considered, do you have reason to believe that your last recommendation is a more efficient query solution than my earlier suggestion?
Post #1455190
Posted Tuesday, May 21, 2013 1:35 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:04 PM
Points: 13,086, Visits: 12,553
geoff5 (5/21/2013)
Sean,

I'm not an expert at execution plans, so I can't speak definitively on which approach is more efficient, but intuitively it would seem to me that referencing the raw data table only once, as my solution does (in the CTE), is going to be a better choice than referencing it three times, as your sorted solution does based on Eugene's "simpler" approach (once in the CTE, once in the main query, and once in the WHERE clause subquery).

In addition, Eugene cited the window functions as the problem he solved by putting the StudentID comparison into a subquery in the WHERE clause, but your sorting solution puts a window function back into a CTE. So all things considered, do you have reason to believe that your last recommendation is a more efficient query solution than my earlier suggestion?


Not at all. I was merely posting it as an option. Once you add the sorting that I put in there it is most likely going to be slower.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1455195
Posted Tuesday, May 21, 2013 1:37 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, September 9, 2014 12:21 PM
Points: 76, Visits: 291
Thank you all!

- Rex
Post #1455196
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse