Make delete statement with 30 equalities more efficient

  • I am working on a statement that deletes records from one table where 30 fields match in value with at least one record in another table. My initial statement was:

    delete <tablename> a

    from <secondtablename> b

    where a.field1 = b.field1

    and a.field2 = b.field2

    ....

    ....

    and a.field30 = b.field30

    I need something more efficient, so I then tried the following, but the performance was even worse:

    delete <tablename> a

    where exists

    (

    select a.field1, a.field2.........a.field30

    intersect

    select b.field1, b.field2.........b.field30

    from <secondtablename> b

    )

    The structure of the tables cannot be changed. Can someone please suggest a more efficient method to accomplish this task?

  • WHY is it inefficient? Did you look at the query plan? How many records? What metrics are you using to determine inefficient?

    If you are doing only a few rows out of many and you aren't seeking on an index it will hurt due to the table scan.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • A big part of this will be down to the structure of your table and the indexes in place. A delete based on thirty separate columns is going to be tough just because thirty columns aren't going to index terribly well. However, surely there are more important columns than others, the columns that most accurately identify the data being deleted. If those are always used, then they're the ones on which you can place an index that could speed this up.

    However, without seeing the full query or the execution plan, these vague suggestions are the best that can be made.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • jonathanmarenus (5/22/2016)


    I need something more efficient, so I then tried the following, but the performance was even worse:

    delete <tablename> a

    where exists

    (

    select a.field1, a.field2.........a.field30

    intersect

    select b.field1, b.field2.........b.field30

    from <secondtablename> b

    )

    That's a weird way of doing an exists, it's requiring an extra scan over table a in return for less typing (maybe less typing).

    DELETE TableA

    WHERE EXISTS (SELECT 1 FROM TableB WHERE TableA.Field1=TableB.Field1 AND TableA.Field2 = TableB.Field2 AND ...)

    It's not going to be pretty though. Is this some archiving or background process?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • What about adding a new derived column and index to each table that creates a hash across the 30 fields. Then do a compare of the hashes.

    ALTER TABLE <tablename>

    ADD HashVal AS HASHBYTES('MD5'

    , ISNULL(Field1, '')

    + ',' + ISNULL(Field2, '')

    + ',' + ...

    + ',' + ISNULL(Field30, '')

    ) PERSISTED NOT NULL;

    create nonclustered index ix_tablename_DeleteHash on <tablename> (HashVal[, PK]);

    ALTER TABLE <secondtablename>

    ADD HashVal AS HASHBYTES('MD5'

    , ISNULL(Field1, '')

    + ',' + ISNULL(Field2, '')

    + ',' + ...

    + ',' + ISNULL(Field30, '')

    ) PERSISTED NOT NULL

    create nonclustered index ix_secondtablename_DeleteHash on <secondtablename> (HashVal[, PK]);

    Then do a delete based on the hash.

    delete a

    from <tablename> a

    join <secondtablename> b

    on a.HashVal = b.HashVal [and a.PK = b.PK]

  • If you go the HASHBYTES method (which I too have used in the past for large-scale data loading processes), consider using SHA or SHA1 instead of MD5 for a bit better hash: http://kejser.org/exploring-hash-functions-in-sql-server/

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • If you're deleting millions of rows, then it's probably the transaction logging, not the actual join or delete operation that's killing you. It's been a couple of weeks since your post, but if you're still struggling with this, then consider "batching" your deletes, meaning write a looping block that deletes TOP X rows. For example, if you're trying to delete 1 million rows, then delete 100,000 for each loop. Start out by setting your batch size so that you're deleting in 10 batches.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply