• SQLkiwi (5/15/2010)


    If the source table is large and there are many duplicates, deleting all the duplicates in one go is likely to be slow and use a great deal of transaction log space. You could consider deleting rows in batches, or using a minimally-logged alternative as shown below:

    -- Create a test database

    USE master;

    CREATE DATABASE RemoveDuplicates;

    GO

    USE RemoveDuplicates;

    GO

    -- Source table with duplicate data (must reside on the default file group)

    CREATE TABLE dbo.Data

    (

    id INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    size INTEGER NOT NULL

    )

    ON [DEFAULT];

    GO

    -- Sample data

    INSERT dbo.Data (size)

    VALUES (1), (1), (2),

    (3), (2), (1), (3);

    GO

    -- Sample non-clustered index

    CREATE NONCLUSTERED INDEX [IX dbo.Data size]

    ON dbo.Data (size)

    ON [DEFAULT];

    GO

    -- Identify the records to keep and save them to a new table

    SELECT D1.id,

    D1.size

    INTO dbo.ToSave

    FROM (

    SELECT *,

    rn = ROW_NUMBER() OVER (PARTITION BY size ORDER BY id)

    FROM dbo.Data D1

    ) D1

    WHERE D1.rn = 1;

    GO

    -- Switch to the bulk logged recovery model to allow

    -- minimally-logged index builds

    -- (in a real system you would take a log backup now)

    ALTER DATABASE RemoveDuplicates

    SET RECOVERY BULK_LOGGED;

    GO

    -- The new table needs the same indexes as the source table

    -- Full requirements: http://technet.microsoft.com/en-us/library/ms191160.aspx

    ALTER TABLE dbo.ToSave

    ADD PRIMARY KEY CLUSTERED (id);

    GO

    CREATE INDEX nc1 ON dbo.ToSave (size) ON [DEFAULT];

    GO

    -- Switch back to full recovery

    -- (in a real system you would take a second log backup now)

    ALTER DATABASE RemoveDuplicates

    SET RECOVERY FULL;

    GO

    -- Replace the original table with the saved data

    BEGIN TRY

    -- Begin atomic operation

    BEGIN TRANSACTION;

    -- Empty the source table

    TRUNCATE TABLE dbo.Data;

    -- Transfer the saved rows (metadata operation)

    ALTER TABLE dbo.ToSave

    SWITCH TO dbo.Data;

    -- Drop the temporary storage

    DROP TABLE dbo.ToSave;

    -- Success

    COMMIT TRANSACTION;

    END TRY

    BEGIN CATCH

    -- Error handling

    DECLARE @Msg NVARCHAR(2048);

    SET @Msg = ERROR_MESSAGE();

    IF XACT_STATE() <> 0 ROLLBACK TRANSACTION;

    RAISERROR('ROLLBACK: %s', 16, 1, @Msg);

    END CATCH;

    GO

    -- Show the de-duplicated data

    SELECT *

    FROM dbo.Data;

    GO

    -- Tidy up

    USE master;

    DROP DATABASE RemoveDuplicates;

    Paul, realize this is an old thread, but I have a question. Would the use of the ALTER TABLE ... SWITCH TO ... require that the use of the Enterprise Editon of SQL Server?