Delete duplicate rows in a Table?

  • Hi,

    I am new to SQL. I am using SQL2005. I have a table that has the following structure

    SET ANSI_NULLS OFF

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING OFF

    GO

    CREATE TABLE [dbo].[folders_rel](

    [id] [varchar](36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [folder_id] [varchar](36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [polymorphic_module] [varchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [polymorphic_id] [varchar](36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [deleted] [bit] NULL DEFAULT ('0'),

    CONSTRAINT [pk_folders_rel] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    There is a one-many relatioship between the "folder_id" and "polymorphic_id". The id that is generated is unique. I was migrating my datas and some duplicate rows has been inserted. It has inserted the same Folder_id and polymorphic_id more than once.. For example

    id folder_id polymorphic_module polymorphic_id deleted

    - -------- ------------------- ------------- -------

    1 Aug Folder ed1 0

    2 Aug Folder ed2 0

    3 Sep Folder ed1 0

    4 Sep Folder ed3 0

    5 Aug Folder ed1 0

    6 Aug Folder ed1 0

    7 Sep Folder ed3 0

    In the above example, u can see the duplicate rows has been created that has the same folder_id and polymorphic_id. That is just a sample example, i have more that 20,000 rows in the table. I want to delete duplicate rows. if there is 3 duplicate records found, i want to delete only 2 records and keep the 1 record. How to do it?

    Thanks

    Raja

  • Did you try to search for "delete duplicate rows" on this site (using the search box on the upper right corner of this site).

    You probably will find this link as one reference:

    http://www.sqlservercentral.com/Forums/Topic652644-338-1.aspx



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi,

    Use

    set rowcount 2

    for delete top 2 rows of matching criteria

  • rajagopalanseeth (3/29/2010)


    I was migrating my datas and some duplicate rows has been inserted.

    Create a unique constraint or index to prevent this happening again.

    Ignore the advice about SET ROWCOUNT - it won't work.

    Follow the link posted by Lutz.

  • Hi ,

    Did u try it Mr. SSCertifiable ?

  • Trouble Shooter (4/12/2010)


    Did u try it Mr. SSCertifiable?

    If you mean me - and my name is Paul - then no. There was nothing to try.

    The original post said:

    "That is just a sample example, i have more that 20,000 rows in the table. I want to delete duplicate rows. if there is 3 duplicate records found, i want to delete only 2 records and keep the 1 record."

    The idea of deleting two rows was just an example - there is no guarantee that only two records will need deleting (even per group). You posted no code or further details, so I can only assume that you completely misunderstood what was required.

    Paul

  • Trouble Shooter (4/12/2010)


    Hi ,

    Did u try it Mr. SSCertifiable ?

    Look at this buddy

    rajagopalanseeth (3/29/2010)


    if there is 3 duplicate records found, i want to delete only 2 records and keep the 1 record. How to do it?

    This is just an example and it is not going to be 2 duplicate rows always. Setting ROWCOUNT 2 will always delete 2 rows and select 2 rows as well. It was i think for this reason that your solution was not an optimal one.

    Typed this when Paul was replying too. Anyways, the reason was the same:-)


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Kingston Dhasian (4/12/2010)


    Typed this when Paul was replying too. Anyways, the reason was the same:-)

    Yep. No worries!

  • Hi,

    2 is the variable only.

    u can set any value there like 1 ,3 etc as per your requirement.

  • Trouble Shooter (4/12/2010)


    Hi,

    2 is the variable only.

    u can set any value there like 1 ,3 etc as per your requirement.

    But, don't you think the method would be a bit tedious. The ROW_NUMBER() method is easier to use and is therefore a better method.

    Did you know about the ROW_NUMBER() method? If you did, why do you think it is a better method? If you didn't, don't you think you have learnt something new?


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Hi,

    Can u delete two duplicate row using row_number function

    if yes ,please let me know

  • Trouble Shooter (4/12/2010)


    Hi,

    Can u delete two duplicate row using row_number function

    if yes ,please let me know

    Yes, it does. It can delete any number of rows.

    Try this

    DECLARE@tblTable TABLE

    (

    Col1 INT

    )

    INSERT@tblTable

    SELECT1 UNION ALL

    SELECT1 UNION ALL

    SELECT1 UNION ALL

    SELECT2 UNION ALL

    SELECT2 UNION ALL

    SELECT3 UNION ALL

    SELECT4

    ; WITH cte_Table AS

    (

    SELECTROW_NUMBER() OVER( PARTITION BY Col1 ORDER BY Col1 ) Row_Num, *

    FROM@tblTable

    )

    DELETE

    FROMcte_Table

    WHERERow_num > 1

    SELECT * FROM @tblTable


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • DECLARE @tbl_Duplicate TABLE

    (

    ID INT

    )

    INSERT INTO @tbl_Duplicate VALUES(1),(2),(1),(3)

    --Before

    SELECT * FROM @tbl_Duplicate

    ;WITH CTE AS

    (

    SELECT

    ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID) AS ROWID

    FROM

    @tbl_Duplicate

    )

    DELETE FROM CTE WHERE ROWID > 1

    --AFTER

    SELECT * FROM @tbl_Duplicate

    Regards,
    Mitesh OSwal
    +918698619998

  • Trouble Shooter (4/12/2010)


    Hi,

    Can u delete two duplicate row using row_number function

    if yes ,please let me know

    We can also do it with APPLY and TOP 😀

    With ROWCOUNT, you'd need to use some horrible cursor or loop to delete a variable number of duplicates per group. The ROW_NUMBER ranking function (or APPLY/TOP) makes it easy - and fast!

    Paul

  • --delete from folders_rel

    where exists (select b.id

    from folders_rel a with(nolock)

    inner join folders_rel b with(nolock) on b.folder_id = a.folder_id

    and b.polymorphic_id = a.polymorphic_id

    and b.id > a.id

    where b.id = folders_rel.id)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply