remove duplicate records, but keep most "current"

  • I am working on a database where we are storing data on multiple servers (SQL 2005), I am importing the data from both servers into one table on one server. I have been reading up on how to delete truly duplicate records (where all fields match), but this doesn't seem to take into account the date.

    sample data set:

    machine name date record_id (primary key)

    machine 1 4/12/2010 1

    machine 1 4/14/2010 2

    machine 2 4/12/2010 3

    machine 3 4/12/2010 4

    I tried using MAX [date] to get only the most recent date, but I think because the records are "unique", a record for machine 1 shows up twice. long story short, I want to delete or ignore "older" records for each machine, regardless of how any other identifier.

    Help?

  • sort in descending order by unique-key combination of columns and date

    keep top 1 or each group defined by you unique-key combination of columns.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Will the below code help you buddy??

    WITH MAX_DATE_FOR_A_MACHINE_CTE(MACHINE_NAME, MAX_DATE)

    AS

    (

    SELECT

    NAME

    , MAX(DATE) MAX_DATE

    FROM

    YOUR_TABLE

    GROUP BY

    NAME

    , DATE

    )

    DELETE Y_T

    FROM

    YOUR_TABLE Y_T

    INNER JOIN

    MAX_DATE_FOR_A_MACHINE_CTE CTE

    WHERE

    (

    Y_T.NAME = CTE.NAME AND

    Y_T.DATE = CTE.MAX_DATE

    )

    Tell us if this worked for you mate..

    Cheers!!

    C'est Pras!

    ~edit :- Aliased the max date column

    P.S: I am writing this code from home,so i have not tested it,please test it and post the results buddy!

  • COldCoffee (4/17/2010)


    Will the below code help you buddy??

    WITH MAX_DATE_FOR_A_MACHINE_CTE(MACHINE_NAME, MAX_DATE)

    AS

    (

    SELECT

    NAME

    , MAX(DATE) MAX_DATE

    FROM

    YOUR_TABLE

    GROUP BY

    NAME

    , DATE

    )

    DELETE Y_T

    FROM

    YOUR_TABLE Y_T

    INNER JOIN

    MAX_DATE_FOR_A_MACHINE_CTE CTE

    WHERE

    (

    Y_T.NAME = CTE.NAME AND

    Y_T.DATE = CTE.MAX_DATE

    )

    Tell us if this worked for you mate..

    Cheers!!

    C'est Pras!

    ~edit :- Aliased the max date column

    P.S: I am writing this code from home,so i have not tested it,please test it and post the results buddy!

    This is the method you would want to employ - test it out and make sure it works before running it in prod. Please let us know as well if it worked.

    ColdCoffee - in your sig line, Je ne suis pas au currant avec le mot 'Pras,' qu'est-ce que cela veut dire?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (4/20/2010)


    COldCoffee (4/17/2010)


    Cheers!!

    C'est Pras!

    ColdCoffee - in your sig line, Je ne suis pas au currant avec le mot 'Pras,' qu'est-ce que cela veut dire?

    Jason, Il est la forme abrégée de mon nom. Mon nom est Prasanna et je suis originaire de l'Inde!!! 🙂 😉 J'ai appris le français pendant six ans!!

  • COldCoffee (4/20/2010)


    CirquedeSQLeil (4/20/2010)


    COldCoffee (4/17/2010)


    Cheers!!

    C'est Pras!

    ColdCoffee - in your sig line, Je ne suis pas au currant avec le mot 'Pras,' qu'est-ce que cela veut dire?

    Jason, Il est la forme abrégée de mon nom. Mon nom est Prasanna et je suis originaire de l'Inde!!! 🙂 😉 J'ai appris le français pendant six ans!!

    Oh voila, ca fais du sens.

    Merci beaucoup.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (4/20/2010)


    COldCoffee (4/20/2010)


    CirquedeSQLeil (4/20/2010)


    COldCoffee (4/17/2010)


    Cheers!!

    C'est Pras!

    ColdCoffee - in your sig line, Je ne suis pas au currant avec le mot 'Pras,' qu'est-ce que cela veut dire?

    Jason, Il est la forme abrégée de mon nom. Mon nom est Prasanna et je suis originaire de l'Inde!!! 🙂 😉 J'ai appris le français pendant six ans!!

    Oh voila, ca fais du sens.

    Merci beaucoup.

    De rien, mon ami! 😉

  • This should help.

    Declare @vTab Table (machinename varchar(10),dateRec datetime,recID int)

    Insert into @vTab

    Select 'machine 1', '4/14/2010', 1 Union ALL

    Select 'machine 1' ,'4/12/2010', 2 Union ALL

    Select 'machine 2' ,'4/12/2010', 3 Union ALL

    Select 'machine 2' ,'4/15/2010', 3 Union ALL

    Select 'machine 3' ,'4/12/2010', 4

    ;with wcte as (

    Select machinename,daterec,row_number() over (partition by machinename order by dateRec desc) as rno

    from @vTab

    ) --Select * from wcte

    Delete from wcte where rno > 1

    Plz try on Test data.

    Atif Sheikh

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sometimes, winning is not an issue but trying.
    You can check my BLOG
    [font="Arial Black"]here[/font][/url][/right]

  • Atif Sheikh (4/20/2010)

    Declare @vTab Table (machinename varchar(10),dateRec datetime,recID int)

    Insert into @vTab

    Select 'machine 1', '4/14/2010', 1 Union ALL

    Select 'machine 1' ,'4/12/2010', 2 Union ALL

    Select 'machine 2' ,'4/12/2010', 3 Union ALL

    Select 'machine 2' ,'4/15/2010', 3 Union ALL

    Select 'machine 3' ,'4/12/2010', 4

    ;with wcte as (

    Select machinename,daterec,row_number() over (partition by machinename order by dateRec desc) as rno

    from @vTab

    ) --Select * from wcte

    Delete from wcte where rno > 1

    I have just done something similar to get rid of duplicates. My first response would have been similar to Atif's above. Like this (which I no longer think is the way to go) :

    Declare @vTab2 Table (machinename varchar(10),dateRec datetime,recID int)

    Insert into @vTab2

    Select 'machine 1', '4/14/2010', 1 Union ALL

    Select 'machine 1' ,'4/12/2010', 2 Union ALL

    Select 'machine 2' ,'4/12/2010', 3 Union ALL

    Select 'machine 2' ,'4/15/2010', 3 Union ALL

    Select 'machine 3' ,'4/12/2010', 4

    ;with wcte as (

    Select machinename, daterec

    ,rank() over (partition by machinename order by dateRec desc) as rno

    from @vTab2

    )

    , cteDups as (

    SELECT wcte.machinename, wcte.daterec from wcte where rno > 1

    )

    DELETE vT2

    FROM cteDups

    INNER JOIN

    @vTab2 vT2

    ON cteDups.machinename = vT2.machinename

    AND cteDups.daterec = vT2.daterec

    I think Atif's code is better because it avoids the join back to the original table I would have used.

    Also I think you could just get away with doing the following in Atif's version and not include the machinename,daterec in the select:

    ;with wcte as (

    Select row_number() over (partition by machinename order by dateRec desc) as rno

    from @vTab

    ) --Select * from wcte

    Delete from wcte where rno > 1

    But if you need to report on the rows you are deleting which was part of a requirement I had you might do this:

    Declare @vTab3 Table (machinename varchar(10),dateRec datetime,recID int)

    Declare @vDelTab Table (machinename varchar(10),dateRec datetime,recID int)

    Insert into @vTab3

    Select 'machine 1', '4/14/2010', 1 Union ALL

    Select 'machine 1' ,'4/12/2010', 2 Union ALL

    Select 'machine 2' ,'4/12/2010', 3 Union ALL

    Select 'machine 2' ,'4/15/2010', 3 Union ALL

    Select 'machine 3' ,'4/12/2010', 4

    ;WITH wcte AS (

    SELECT machinename, dateRec, recID

    ,rank() OVER (PARTITION BY machinename ORDER BY dateRec DESC) AS rno

    FROM @vTab3

    )

    DELETE wcte

    OUTPUT DELETED.machinename, DELETED.dateRec, DELETED.recID

    INTO @vDelTab (machinename, dateRec, recID)

    WHERE rno > 1

    SELECT * FROM @vTab3

    SELECT * FROM @vDelTab

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply