Delete Duplicate Records,

  • I have a table with two columns Id(PK,Identity) and size of int type.

    size have some duplicating enteries in it ,and i want to keep only one size value without changing its id(value).

    consider the table below

    Declare @table Table (Id int IDENTITY(1,1) PRIMARY KEY,size int)

    insert into @table values(1)

    insert into @table values(1)

    insert into @table values(2)

    insert into @table values(3)

    insert into @table values(2)

    insert into @table values(1)

    insert into @table values(3)

    Desired out put will be like

    Id Size

    1 1

    3 2

    4 3

    i have done something by using temp table and a loop sturcture but looking for some other way.

    Thanks

  • This will get you there!

    ;WITH CTE AS

    (

    SELECT ROW_NUMBER() OVER(PARTITION BY SIZE ORDER BY ID) RN , ID, SIZE FROM @table

    )

    DELETE FROM CTE WHERE RN <> 1

    Hope this helps!

    Cheers!

  • @ColdCofee

    thanks a lot for your response

    It Worked for me

  • Nice to hear your issue is fixed! You're welcome, faheem! 🙂

    C'est Pras! 😎

  • If the source table is large and there are many duplicates, deleting all the duplicates in one go is likely to be slow and use a great deal of transaction log space. You could consider deleting rows in batches, or using a minimally-logged alternative as shown below:

    -- Create a test database

    USE master;

    CREATE DATABASE RemoveDuplicates;

    GO

    USE RemoveDuplicates;

    GO

    -- Source table with duplicate data (must reside on the default file group)

    CREATE TABLE dbo.Data

    (

    id INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    size INTEGER NOT NULL

    )

    ON [DEFAULT];

    GO

    -- Sample data

    INSERT dbo.Data (size)

    VALUES (1), (1), (2),

    (3), (2), (1), (3);

    GO

    -- Sample non-clustered index

    CREATE NONCLUSTERED INDEX [IX dbo.Data size]

    ON dbo.Data (size)

    ON [DEFAULT];

    GO

    -- Identify the records to keep and save them to a new table

    SELECT D1.id,

    D1.size

    INTO dbo.ToSave

    FROM (

    SELECT *,

    rn = ROW_NUMBER() OVER (PARTITION BY size ORDER BY id)

    FROM dbo.Data D1

    ) D1

    WHERE D1.rn = 1;

    GO

    -- Switch to the bulk logged recovery model to allow

    -- minimally-logged index builds

    -- (in a real system you would take a log backup now)

    ALTER DATABASE RemoveDuplicates

    SET RECOVERY BULK_LOGGED;

    GO

    -- The new table needs the same indexes as the source table

    -- Full requirements: http://technet.microsoft.com/en-us/library/ms191160.aspx

    ALTER TABLE dbo.ToSave

    ADD PRIMARY KEY CLUSTERED (id);

    GO

    CREATE INDEX nc1 ON dbo.ToSave (size) ON [DEFAULT];

    GO

    -- Switch back to full recovery

    -- (in a real system you would take a second log backup now)

    ALTER DATABASE RemoveDuplicates

    SET RECOVERY FULL;

    GO

    -- Replace the original table with the saved data

    BEGIN TRY

    -- Begin atomic operation

    BEGIN TRANSACTION;

    -- Empty the source table

    TRUNCATE TABLE dbo.Data;

    -- Transfer the saved rows (metadata operation)

    ALTER TABLE dbo.ToSave

    SWITCH TO dbo.Data;

    -- Drop the temporary storage

    DROP TABLE dbo.ToSave;

    -- Success

    COMMIT TRANSACTION;

    END TRY

    BEGIN CATCH

    -- Error handling

    DECLARE @Msg NVARCHAR(2048);

    SET @Msg = ERROR_MESSAGE();

    IF XACT_STATE() <> 0 ROLLBACK TRANSACTION;

    RAISERROR('ROLLBACK: %s', 16, 1, @Msg);

    END CATCH;

    GO

    -- Show the de-duplicated data

    SELECT *

    FROM dbo.Data;

    GO

    -- Tidy up

    USE master;

    DROP DATABASE RemoveDuplicates;

  • SQLkiwi (5/15/2010)


    If the source table is large and there are many duplicates, deleting all the duplicates in one go is likely to be slow and use a great deal of transaction log space. You could consider deleting rows in batches, or using a minimally-logged alternative as shown below:

    -- Create a test database

    USE master;

    CREATE DATABASE RemoveDuplicates;

    GO

    USE RemoveDuplicates;

    GO

    -- Source table with duplicate data (must reside on the default file group)

    CREATE TABLE dbo.Data

    (

    id INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    size INTEGER NOT NULL

    )

    ON [DEFAULT];

    GO

    -- Sample data

    INSERT dbo.Data (size)

    VALUES (1), (1), (2),

    (3), (2), (1), (3);

    GO

    -- Sample non-clustered index

    CREATE NONCLUSTERED INDEX [IX dbo.Data size]

    ON dbo.Data (size)

    ON [DEFAULT];

    GO

    -- Identify the records to keep and save them to a new table

    SELECT D1.id,

    D1.size

    INTO dbo.ToSave

    FROM (

    SELECT *,

    rn = ROW_NUMBER() OVER (PARTITION BY size ORDER BY id)

    FROM dbo.Data D1

    ) D1

    WHERE D1.rn = 1;

    GO

    -- Switch to the bulk logged recovery model to allow

    -- minimally-logged index builds

    -- (in a real system you would take a log backup now)

    ALTER DATABASE RemoveDuplicates

    SET RECOVERY BULK_LOGGED;

    GO

    -- The new table needs the same indexes as the source table

    -- Full requirements: http://technet.microsoft.com/en-us/library/ms191160.aspx

    ALTER TABLE dbo.ToSave

    ADD PRIMARY KEY CLUSTERED (id);

    GO

    CREATE INDEX nc1 ON dbo.ToSave (size) ON [DEFAULT];

    GO

    -- Switch back to full recovery

    -- (in a real system you would take a second log backup now)

    ALTER DATABASE RemoveDuplicates

    SET RECOVERY FULL;

    GO

    -- Replace the original table with the saved data

    BEGIN TRY

    -- Begin atomic operation

    BEGIN TRANSACTION;

    -- Empty the source table

    TRUNCATE TABLE dbo.Data;

    -- Transfer the saved rows (metadata operation)

    ALTER TABLE dbo.ToSave

    SWITCH TO dbo.Data;

    -- Drop the temporary storage

    DROP TABLE dbo.ToSave;

    -- Success

    COMMIT TRANSACTION;

    END TRY

    BEGIN CATCH

    -- Error handling

    DECLARE @Msg NVARCHAR(2048);

    SET @Msg = ERROR_MESSAGE();

    IF XACT_STATE() <> 0 ROLLBACK TRANSACTION;

    RAISERROR('ROLLBACK: %s', 16, 1, @Msg);

    END CATCH;

    GO

    -- Show the de-duplicated data

    SELECT *

    FROM dbo.Data;

    GO

    -- Tidy up

    USE master;

    DROP DATABASE RemoveDuplicates;

    Paul, realize this is an old thread, but I have a question. Would the use of the ALTER TABLE ... SWITCH TO ... require that the use of the Enterprise Editon of SQL Server?

  • Lynn Pettis (7/5/2011)


    Paul, realize this is an old thread, but I have a question. Would the use of the ALTER TABLE ... SWITCH TO ... require that the use of the Enterprise Editon of SQL Server?

    Hi Lynn,

    This is the number one question I get asked about this - and the answer is most definitely 'no'.

    All tables are partitioned (most just happen to have one partition!) and ALTER TABLE...SWITCH is available in all editions of SQL Server. Full table partitioning (with schemes and functions and such) is indeed Enterprise-equivalent-only.

    Paul

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply