Selecting distinct values from dupes based on latest date

  • I have a table of patients that has often multiple occurrences of a patient number but with differing dates of which I need to return only the latest date but I'm unsure if I need a subquery, in short I'm stuck as to what might work. Here is my table and some sample data.

    The below gives output of:

    123548, '2022-01-17'

    123548, '2022-02-21'

    985254, '2022-03-19'

    774589, '2022-01-17'

    754512, '2022-04-04'

    754512, '2022-02-09'

    But I need the output to be:

    123548, '2022-02-21'

    754512, '2022-04-04'

    create table [dbo].[cases](
    [case_id] [int] identity(1,1) not null,
    [pat_id] [bigint] not null,
    [eff_dt] [datetime] null

    INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (123548, '2022-01-17')
    INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (123548, '2022-02-21')
    INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (985254, '2022-03-19')
    INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (774589, '2022-01-17')
    INSERT INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (754512, '2022-04-04')
    insert INTO [dbo].[cases] ([pat_id],[eff_dt]) VALUES (754512, '2022-02-09')

    select pat_id, eff_dt
    from dbo.cases
  • The standard approach for this is to use a CTE and ROW_NUMBER to get the "top 1" in each group. You can also use a windowed COUNT to allow you to filter to patients with multiple rows:

    WITH CTE AS(
    SELECT pat_id,
    eff_dt,
    COUNT(pat_id) OVER (PARTITION BY pat_id) AS C,
    ROW_NUMBER() OVER (PARTITION BY pat_id ORDER BY eff_dt DESC) AS RN
    FROM dbo.cases)
    SELECT pat_id,
    eff_dt
    FROM CTE
    WHERE C > 1
    AND RN = 1;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • That works perfectly, many thanks!

  • One question, what if I have some dates as NULL values and want them to return?

  • DaveBriCam wrote:

    One question, what if I have some dates as NULL values and want them to return?

    The null date is ordered after the valid dates (in descending order)  so will not be returned with row_number = 1 unless there are no other dates. You can use isnull(eff_dt, '1900-0101') if you really wanted to make sure, but applying functions like this could degrade performance.

    The code excludes patients with only one row, so if the patient has one row and the eff_dt is null, they will be dropped along with any other patients with a single eff_dt. Your desired output implied that's what you wanted, but if that is not the case you could drop the C > 1 line.

    If you have a patient with two rows and both have null eff_dt, or there are duplicate max eff_dt values, one of them will be returned, but the result will be non-deterministic. In this case I would order the row_number by eff_dt desc, case_id desc so that the same rows are always returned. In this example it doesn't matter, but if dupes are possible and you needed to return the case_id of the most recent eff_dt so you could join it to other tables etc, making it deterministic would be a good idea.

    • This reply was modified 1 year, 11 months ago by  Ed B.
    • This reply was modified 1 year, 11 months ago by  Ed B. Reason: typo
  • Sorry, I was wrong with what my needed output should be, I needed this, in other words keeping those with just one date:

    123548, '2022-02-21'

    985254, '2022-03-19'

    774589, '2022-01-17'

    754512, '2022-04-04'

     

  • DaveBriCam wrote:

    Sorry, I was wrong with what my needed output should be, I needed this, in other words keeping those with just one date:

    123548, '2022-02-21'

    985254, '2022-03-19'

    774589, '2022-01-17'

    754512, '2022-04-04'

    All the information you need to get this answer is still in my code before; you just need to now not just return rows where there are more than 1 instance of a value.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I thought it best to add onto an existing thread since the issue is very similar with a new challenge. I need to return all the insurance numbers that have two different people listed such as:

    pat_ID              ins_num

    111807              100449401H

    461261               100449401H

    106820             101159201B

    35599               101203104H

    85116                101859902J

    445903             103703602G

    199156              103839502A

    342322             104206501C

    457017              105117401F

    98573                105450402E

    Note that what would need to be returned would be only the number '100449401H'

    Below is some code I hoped would work but too many records are being returned:

    with CTE
    as (select i.pat_id
    , i.ins_num
    , count(i.pat_id) over (partition by i.pat_id) as C
    , row_number() over (partition by i.ins_num order by i.ins_num desc) as RN
    from dbo.tb_pat_ins i
    )
    select pat_id
    , ins_num
    from CTE
    where C > 1
    and RN = 1;
  • Does this work?

    DROP TABLE IF EXISTS #tb_pat_ins 

    CREATE TABLE #tb_pat_ins
    ( pat_ID INT,
    ins_num VARCHAR(20)
    )

    INSERT #tb_pat_ins (pat_ID,ins_num)
    VALUES (111807 , '100449401H'),
    (461261, '100449401H'),
    (106820, '101159201B'),
    (35599, '101203104H'),
    (85116,' 101859902J'),
    (445903, '103703602G'),
    (199156, '103839502A'),
    (342322, '104206501C'),
    (457017, '105117401F'),
    (98573, '105450402E')

    SELECT i.ins_num
    FROM #tb_pat_ins i
    GROUP BY i.ins_num
    HAVING COUNT(DISTINCT pat_ID) > 1
  • That works until I try to add additional fields to be returned:

    select distinct
    i.ins_num
    from dbo.tb_pat_ins i
    where i.active = 1
    and i.bp_id <> 1
    group by i.ins_num
    having count(distinct pat_id) > 1
    order by i.ins_num desc; -- returns 442 records

    select distinct
    i.ins_num
    , i.pat_id
    , i.bp_id
    , i.eff_dt
    , i.end_dt
    , i.card_ln
    , i.card_fn
    , i.card_mn
    from dbo.tb_pat_ins i
    where i.active = 1
    and i.bp_id <> 1
    group by i.ins_num
    , i.pat_id
    , i.bp_id
    , i.eff_dt
    , i.end_dt
    , i.card_ln
    , i.card_fn
    , i.card_mn
    having count(distinct pat_id) > 1
    order by i.ins_num desc; -- returns no records
  • Identify the ins_nums, then join/semi join back to the table.

    SELECT i.ins_num
    , i.pat_id
    , i.bp_id
    , i.eff_dt
    , i.end_dt
    , i.card_ln
    , i.card_fn
    , i.card_mn
    from dbo.tb_pat_ins as i
    where i.active = 1
    and i.bp_id <> 1
    and exists (select 1
    from dbo.tb_pat_ins as i2
    where i2.ins_num = i.imsnum
    and i2.active = 1
    and i2.bp_id <> 1
    group by i2.ins_num
    having count(i2.distinct pat_id) > 1
    )

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply