Make delete statement with 30 equalities more efficient

Question

Make delete statement with 30 equalities more efficient

jonathanmarenus

Mr or Mrs. 500

Points: 508
More actions
May 22, 2016 at 1:17 pm

#318909

I am working on a statement that deletes records from one table where 30 fields match in value with at least one record in another table. My initial statement was:
delete <tablename> a
from <secondtablename> b
where a.field1 = b.field1
and a.field2 = b.field2
....
....
and a.field30 = b.field30
I need something more efficient, so I then tried the following, but the performance was even worse:
delete <tablename> a
where exists
(
select a.field1, a.field2.........a.field30
intersect
select b.field1, b.field2.........b.field30
from <secondtablename> b
)
The structure of the tables cannot be changed. Can someone please suggest a more efficient method to accomplish this task?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

WHY is it inefficient? Did you look at the query plan? How many records? What metrics are you using to determine inefficient?

If you are doing only a few rows out of many and you aren't seeking on an index it will hurt due to the table scan.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Grant Fritchey SSC Guru Points: 398707 More actions · Answer 2

A big part of this will be down to the structure of your table and the indexes in place. A delete based on thirty separate columns is going to be tough just because thirty columns aren't going to index terribly well. However, surely there are more important columns than others, the columns that most accurately identify the data being deleted. If those are always used, then they're the ones on which you can place an index that could speed this up.

However, without seeing the full query or the execution plan, these vague suggestions are the best that can be made.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

jonathanmarenus (5/22/2016)
I need something more efficient, so I then tried the following, but the performance was even worse:
delete <tablename> a
where exists
(
select a.field1, a.field2.........a.field30
intersect
select b.field1, b.field2.........b.field30
from <secondtablename> b
)

That's a weird way of doing an exists, it's requiring an extra scan over table a in return for less typing (maybe less typing).

DELETE TableA

WHERE EXISTS (SELECT 1 FROM TableB WHERE TableA.Field1=TableB.Field1 AND TableA.Field2 = TableB.Field2 AND ...)

It's not going to be pretty though. Is this some archiving or background process?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

DesNorton SSC-Insane Points: 24283 More actions · Answer 4

What about adding a new derived column and index to each table that creates a hash across the 30 fields. Then do a compare of the hashes.

ALTER TABLE <tablename>

ADD HashVal AS HASHBYTES('MD5'

, ISNULL(Field1, '')

+ ',' + ISNULL(Field2, '')

+ ',' + ...

+ ',' + ISNULL(Field30, '')

) PERSISTED NOT NULL;

create nonclustered index ix_tablename_DeleteHash on <tablename> (HashVal[, PK]);

ALTER TABLE <secondtablename>

ADD HashVal AS HASHBYTES('MD5'

, ISNULL(Field1, '')

+ ',' + ISNULL(Field2, '')

+ ',' + ...

+ ',' + ISNULL(Field30, '')

) PERSISTED NOT NULL

create nonclustered index ix_secondtablename_DeleteHash on <secondtablename> (HashVal[, PK]);

Then do a delete based on the hash.

delete a

from <tablename> a

join <secondtablename> b

on a.HashVal = b.HashVal [and a.PK = b.PK]

How to post data/code on a forum to get the best help.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

If you go the HASHBYTES method (which I too have used in the past for large-scale data loading processes), consider using SHA or SHA1 instead of MD5 for a bit better hash: http://kejser.org/exploring-hash-functions-in-sql-server/

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Eric M Russell SSC Guru Points: 125523 More actions · Answer 6

If you're deleting millions of rows, then it's probably the transaction logging, not the actual join or delete operation that's killing you. It's been a couple of weeks since your post, but if you're still struggling with this, then consider "batching" your deletes, meaning write a looping block that deletes TOP X rows. For example, if you're trying to delete 1 million rows, then delete 100,000 for each loop. Start out by setting your batch size so that you're deleting in 10 batches.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho