# Selecting distinct values from dupes based on latest date

• I have a table of patients that has often multiple occurrences of a patient number but with differing dates of which I need to return only the latest date but I'm unsure if I need a subquery, in short I'm stuck as to what might work. Here is my table and some sample data.

The below gives output of:

123548, '2022-01-17'

123548, '2022-02-21'

985254, '2022-03-19'

774589, '2022-01-17'

754512, '2022-04-04'

754512, '2022-02-09'

But I need the output to be:

123548, '2022-02-21'

754512, '2022-04-04'

`create table [dbo].[cases]([case_id] [int] identity(1,1) not null,[pat_id] [bigint] not null,[eff_dt] [datetime] nullINSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (123548, '2022-01-17')INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (123548, '2022-02-21')INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (985254, '2022-03-19')INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (774589, '2022-01-17')INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (754512, '2022-04-04')insert INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (754512, '2022-02-09')select pat_id, eff_dt         from dbo.cases`
• The standard approach for this is to use a CTE and `ROW_NUMBER` to get the "top 1" in each group. You can also use a windowed `COUNT` to allow you to filter to patients with multiple rows:

`WITH CTE AS(    SELECT pat_id,           eff_dt,           COUNT(pat_id) OVER (PARTITION BY pat_id) AS C,           ROW_NUMBER() OVER (PARTITION BY pat_id ORDER BY eff_dt DESC) AS RN    FROM dbo.cases)SELECT pat_id,       eff_dtFROM CTEWHERE C > 1  AND RN = 1;`

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

• That works perfectly, many thanks!

• One question, what if I have some dates as NULL values and want them to return?

• DaveBriCam wrote:

One question, what if I have some dates as NULL values and want them to return?

The null date is ordered after the valid dates (in descending order)  so will not be returned with row_number = 1 unless there are no other dates. You can use isnull(eff_dt, '1900-0101') if you really wanted to make sure, but applying functions like this could degrade performance.

The code excludes patients with only one row, so if the patient has one row and the eff_dt is null, they will be dropped along with any other patients with a single eff_dt. Your desired output implied that's what you wanted, but if that is not the case you could drop the C > 1 line.

If you have a patient with two rows and both have null eff_dt, or there are duplicate max eff_dt values, one of them will be returned, but the result will be non-deterministic. In this case I would order the row_number by eff_dt desc, case_id desc so that the same rows are always returned. In this example it doesn't matter, but if dupes are possible and you needed to return the case_id of the most recent eff_dt so you could join it to other tables etc, making it deterministic would be a good idea.

• This reply was modified 1 month, 1 week ago by  Ed B.
• This reply was modified 1 month, 1 week ago by  Ed B. Reason: typo
• Sorry, I was wrong with what my needed output should be, I needed this, in other words keeping those with just one date:

123548, '2022-02-21'

985254, '2022-03-19'

774589, '2022-01-17'

754512, '2022-04-04'

• DaveBriCam wrote:

Sorry, I was wrong with what my needed output should be, I needed this, in other words keeping those with just one date:

123548, '2022-02-21'

985254, '2022-03-19'

774589, '2022-01-17'

754512, '2022-04-04'

All the information you need to get this answer is still in my code before; you just need to now not just return rows where there are more than 1 instance of a value.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Viewing 7 posts - 1 through 7 (of 7 total)