Duplicate Data in a Table

  • Hello Everyone

    This is yet another issue of duplicate data. This is very old data that is coming over from an old Unix and DB2 database. Just my luck.

    In a table, I have duplicate rows, all except for a DateCreated columns, each of the rows has a distinct date created value. I need to select only the row, of the duplicate values, with the latest DateCreated. Some of the data does not have a duplicate value, but some does.

    I can select the correct code if I select a couple times from the table, and use a couple table variables, but I am hoping to get a more efficient way

    Here is a small example, if more is needed, I will create an insert query for you

    Mary Poppins, 123456, 12-07-05 14:23:45

    Mary Poppins, 123456, 12-12-05 08:15:28 -- I want this one

    Daffy Duck , 234567, 01-15-05 09:27:23 -- I want this one

    Wile E. Coyote, 345678, 03-01-05 18:12:23

    Wile E. Coyote, 345678, 03-01-05 20:15:20

    Wile E. Coyote, 345678, 03-01-05 20:18:56 -- I want this one

    Thank You in advance for any and all help and suggestions

    Andrew SQLDBA

  • AndrewSQLDBA (11/9/2011)


    Hello Everyone

    This is yet another issue of duplicate data. This is very old data that is coming over from an old Unix and DB2 database. Just my luck.

    In a table, I have duplicate rows, all except for a DateCreated columns, each of the rows has a distinct date created value. I need to select only the row, of the duplicate values, with the latest DateCreated. Some of the data does not have a duplicate value, but some does.

    I can select the correct code if I select a couple times from the table, and use a couple table variables, but I am hoping to get a more efficient way

    Here is a small example, if more is needed, I will create an insert query for you

    Mary Poppins, 123456, 12-07-05 14:23:45

    Mary Poppins, 123456, 12-12-05 08:15:28 -- I want this one

    Daffy Duck , 234567, 01-15-05 09:27:23 -- I want this one

    Wile E. Coyote, 345678, 03-01-05 18:12:23

    Wile E. Coyote, 345678, 03-01-05 20:15:20

    Wile E. Coyote, 345678, 03-01-05 20:18:56 -- I want this one

    Thank You in advance for any and all help and suggestions

    Andrew SQLDBA

    An insert statement would make it easier to provide tested code, but I think you need something like this:

    WITH data AS

    (

    SELECT

    *,

    ROW_NUMBER() OVER(PARTITION BY NAME, Id ORDER BY date_column DESC) AS row_no

    FROM

    table

    )

    SELECT

    *

    FROM

    DATA

    WHERE

    row_no = 1

  • you could do this with a cte and row_number() or an inner join

    eg.

    to select the rows with duplicates and get the maxdate created, if you want to include non-duplicates as well remove the having clause

    Select * from YourTable a

    inner join

    (Select Character, Max(DateCreated) as MaxDateCreated

    from YourTable

    group by Character

    Having count(*) > 1) b

    on a.character = b.character and a.DateCreated = b.MaxDateCreated

  • With 582 points and 1500 visits you must have known that the sample data in form of INSERT statement will make the life easier for the answerers. I also see that you have mentioned that you will provide insert statments if needed; it IS needed, be the dataset is large or small.

    That said, here is one of doing it

    WITH SampleData (UserName, UserID, DateCreated) AS

    (

    SELECT 'Mary Poppins', 123456, '12-07-05 14:23:45'

    UNION ALL SELECT 'Mary Poppins', 123456, '12-12-05 08:15:28' -- I want this one

    UNION ALL SELECT 'Daffy Duck' , 234567, '01-15-05 09:27:23' -- I want this one

    UNION ALL SELECT 'Wile E. Coyote', 345678, '03-01-05 18:12:23'

    UNION ALL SELECT 'Wile E. Coyote', 345678, '03-01-05 20:15:20'

    UNION ALL SELECT 'Wile E. Coyote', 345678, '03-01-05 20:18:56'

    ),

    GroupNumbered AS

    (

    SELECT RN = ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY DateCreated DESC)

    ,UserName, UserID, DateCreated

    FROM SampleData

    )

    SELECT UserName, UserID, DateCreated

    FROM GroupNumbered

    WHERE RN = 1

  • Another way of doing it:

    WITH SampleData (UserName, UserID, DateCreated) AS

    (

    SELECT 'Mary Poppins', 123456, '12-07-05 14:23:45'

    UNION ALL SELECT 'Mary Poppins', 123456, '12-12-05 08:15:28' -- I want this one

    UNION ALL SELECT 'Daffy Duck' , 234567, '01-15-05 09:27:23' -- I want this one

    UNION ALL SELECT 'Wile E. Coyote', 345678, '03-01-05 18:12:23'

    UNION ALL SELECT 'Wile E. Coyote', 345678, '03-01-05 20:15:20'

    UNION ALL SELECT 'Wile E. Coyote', 345678, '03-01-05 20:18:56'

    ),

    DistinctUserID AS

    (

    SELECT UserID

    FROM SampleData

    GROUP BY UserID

    )

    SELECT CrsAppOutPut.UserName , OutTab.UserID , CrsAppOutPut.DateCreated

    FROM DistinctUserID OutTab

    CROSS APPLY (

    SELECT TOP 1 UserName , DateCreated

    FROM SampleData InTab

    WHERE OutTab.UserID = InTab.UserID

    ORDER BY InTab.DateCreated DESC

    ) CrsAppOutPut

  • So sorry to everyone for not posting properly. My thought was to post it that way, and I knew the answer would be very simple. But if not, I would post a proper insert statement in a later post if needed. I was merely in a hurry

    Thanks for all, it is working great. I tried every suggestion.

    Thanks again, and hope that you all have a good day

    Andrew SQLDBA

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply