remove rows from staging table that are duplicated before insert the data into the database normal tables

  • Hi,

    I have this table:

    CREATE TABLE [dbo].[BULK_ACTIVIDADES](

    [CODCTB] [bigint] NULL,

    [NIF_ANTIGO] [varchar](20) NOT NULL,

    [NIF] [varchar](20) NULL,

    [FILIAL_NUMBER] [varchar](20) NULL,

    [INDEX_POS] [int] NULL,

    [DESC_ACTIVIDADE] [varchar](100) NULL

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    I want to delete the duplicated rows.

    I could do something like:

    select * into bulk_actividades1

    from

    (select * from bulk_actividades

    union

    select * from bulk_actividades) r

    truncate table bulk_actividades

    drop table bulk_actividades

    select * into bulk_actividades from bulk_actvidades1

    truncate bulk_actividades1

    drop table bulk_actvidades1

    But this table and other have a lot of rows (3 million) so I ask. Do you SQL Server experts know a faster way of doing this in t-sql?

    thank you.

  • One thing you could do is just use a SELECT DISTINCT when transferring the data, but if you want to remove duplicates I'd suggest this:

    WITH deDupe

    AS (

    SELECT

    BA.CODCTB,

    BA.NIF_ANTIGO,

    BA.NIF,

    BA.FILIAL_NUMBER,

    BA.INDEX_POS,

    BA.DESC_ACTIVIDADE,

    ROW_NUMBER() OVER (PARTITION BY BA.CODCTB, BA.NIF_ANTIGO,

    BA.NIF, BA.FILIAL_NUMBER, BA.INDEX_POS,

    BA.DESC_ACTIVIDADE ORDER BY BA.CODCTB) AS RowNo

    FROM

    dbo.BULK_ACTIVIDADES AS BA

    )

    DELETE FROM

    deDupe

    WHERE

    deDupe.RowNo > 1

  • Is that faster than my exemple?

  • I believe the pattern I have proposed will outperform the pattern you have proposed. My proposal only makes one pass at the data while yours makes multiple passes and multiple copies so much more write activity in particular.

  • understood. I will test it.

    Thank you.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply