• EamonSQL (10/24/2012)


    select from the table using the ROW_NUMBER() function ordering by id and partitioning by value.

    Wrap in brackets giving you a derived table.

    Run a delete command against the table and joining to the derived table on the id and where rownum > 1

    That should sort you out.

    Eamon

    The OP said: -

    praveen_vejandla (10/24/2012)


    id value name

    -- ------ -------

    1 10 test

    2 5 prod

    3 10 test

    4 4 test

    5 4 test

    6 10 test

    Only records with id 4,5 should be deleted.

    Here's some sample data to play with: -

    DECLARE @testtable TABLE (id INT, value INT, NAME CHAR(4));

    INSERT INTO @testtable

    VALUES (1, 10, 'test'), (2, 5, 'prod'), (3, 10, 'test'), (4, 4, 'test'), (5, 4, 'test'), (6, 10, 'test');

    So the expected result after you have deleted the bad rows is: -

    id value NAME

    ----------- ----------- ----

    1 10 test

    2 5 prod

    3 10 test

    6 10 test

    So, let's code up what you just described: -

    DELETE a

    FROM @testtable a

    INNER JOIN (SELECT id, value, NAME,

    ROW_NUMBER() OVER(PARTITION BY value ORDER BY id) AS rownum

    FROM @testtable) b ON a.id = b.id

    WHERE b.rownum > 1;

    OK, now we'll run it against the sample data and the result is: -

    id value NAME

    ----------- ----------- ----

    1 10 test

    2 5 prod

    4 4 test

    Ah, it seems you've deleted a few extra rows and kept a row that we wanted to delete.

    OK, let's take a look at ThomasRushton's answer: -

    DELETE

    FROM @testtable

    WHERE id IN (SELECT t2.id

    FROM @testtable t1

    INNER JOIN @testtable t2 ON t1.id = t2.id - 1

    INNER JOIN @testtable t3 ON t2.id = t3.id - 1

    WHERE t1.NAME = t2.NAME AND t2.NAME = t3.NAME

    );

    For me, there are two issues here. The first is that we're touching the table 4 times, which is unnecessary. The second is that we're only looking at "name" to see if a record is a duplicate. (If the OP agrees with this as the requirement, then we can scratch my second issue).

    What about the result? Does it give the correct answer from the sample data?

    id value NAME

    ----------- ----------- ----

    1 10 test

    2 5 prod

    3 10 test

    6 10 test

    Yes is does! Excellent.

    Now, let's look at my answer.

    DELETE a

    FROM @testtable a

    WHERE EXISTS (SELECT 1

    FROM @testtable b

    WHERE (a.id = b.id+1 OR a.id = b.id-1)

    AND a.NAME = b.NAME AND a.value = b.value);

    This time we're only touching the table twice and we're looking at both name and value to determine a duplicate. But does it produce the correct result based on the sample data?

    id value NAME

    ----------- ----------- ----

    1 10 test

    2 5 prod

    3 10 test

    6 10 test

    Again, yes it does.


    A question for the OP though - after the query has been run you'll notice that the result-set from the has a new "duplicate" (3 and 6). If you were to run either of the working solutions again, they wouldn't detect this as a duplicate. Would you expect them to? Or is this desired behaviour?


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/