SQLkiwi (5/15/2010)
If the source table is large and there are many duplicates, deleting all the duplicates in one go is likely to be slow and use a great deal of transaction log space. You could consider deleting rows in batches, or using a minimally-logged alternative as shown below:
-- Create a test database
USE master;
CREATE DATABASE RemoveDuplicates;
GO
USE RemoveDuplicates;
GO
-- Source table with duplicate data (must reside on the default file group)
CREATE TABLE dbo.Data
(
id INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
size INTEGER NOT NULL
)
ON [DEFAULT];
GO
-- Sample data
INSERT dbo.Data (size)
VALUES (1), (1), (2),
(3), (2), (1), (3);
GO
-- Sample non-clustered index
CREATE NONCLUSTERED INDEX [IX dbo.Data size]
ON dbo.Data (size)
ON [DEFAULT];
GO
-- Identify the records to keep and save them to a new table
SELECT D1.id,
D1.size
INTO dbo.ToSave
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY size ORDER BY id)
FROM dbo.Data D1
) D1
WHERE D1.rn = 1;
GO
-- Switch to the bulk logged recovery model to allow
-- minimally-logged index builds
-- (in a real system you would take a log backup now)
ALTER DATABASE RemoveDuplicates
SET RECOVERY BULK_LOGGED;
GO
-- The new table needs the same indexes as the source table
-- Full requirements: http://technet.microsoft.com/en-us/library/ms191160.aspx
ALTER TABLE dbo.ToSave
ADD PRIMARY KEY CLUSTERED (id);
GO
CREATE INDEX nc1 ON dbo.ToSave (size) ON [DEFAULT];
GO
-- Switch back to full recovery
-- (in a real system you would take a second log backup now)
ALTER DATABASE RemoveDuplicates
SET RECOVERY FULL;
GO
-- Replace the original table with the saved data
BEGIN TRY
-- Begin atomic operation
BEGIN TRANSACTION;
-- Empty the source table
TRUNCATE TABLE dbo.Data;
-- Transfer the saved rows (metadata operation)
ALTER TABLE dbo.ToSave
SWITCH TO dbo.Data;
-- Drop the temporary storage
DROP TABLE dbo.ToSave;
-- Success
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
-- Error handling
DECLARE @Msg NVARCHAR(2048);
SET @Msg = ERROR_MESSAGE();
IF XACT_STATE() <> 0 ROLLBACK TRANSACTION;
RAISERROR('ROLLBACK: %s', 16, 1, @Msg);
END CATCH;
GO
-- Show the de-duplicated data
SELECT *
FROM dbo.Data;
GO
-- Tidy up
USE master;
DROP DATABASE RemoveDuplicates;
Paul, realize this is an old thread, but I have a question. Would the use of the ALTER TABLE ... SWITCH TO ... require that the use of the Enterprise Editon of SQL Server?