Blog Post

10-minute SQL: Filtered Delete: join vs. sub-select vs. exist

,

I was doing some simple tuning of procedures for a client when I came across some delete statements that I wanted to look into some more. They were filtering based on values in a temp table.

In a majority of cases I have come across at clients a row is deleted from a single table based on some static and unique identifying characteristic that is directly related to that row and that’s it. In this case we were looking to remove a more complex item from a shopping cart and had to make sure any ‘child’ items were removed as well. This can be done in a variety of ways but I wanted to look into 3 methods in particular. Below was a quick test I threw together to test out some timings and IO for different methods of performing the same delete transaction using a sub-select, exists and an inner join.

First – I setup a couple of tables, loaded the first with 50k random values and the second I randomly selected 5k form the first table to be deleted

SET NOCOUNT ON;
CREATE TABLE dbo.DEL1 (ident INT IDENTITY PRIMARY key, Descript VARCHAR(10));
CREATE TABLE dbo.del2 (ident2 INT);
go
-- insert 50k rows into first table
INSERT  dbo.DEL1 (descript)
SELECT (LEFT(NEWID(),10));
GO 50000
-- insert random 5k of the first table's rows into second table
INSERT dbo.del2 (ident2)
SELECT TOP 5000 ident FROM dbo.del1
ORDER BY NEWID();
-- verify values there
SELECT * FROM dbo.del1
SELECT * FROM dbo.del2

Then I went and wrapped a series of delete statements in transactions so I could capture the info and not have to reload tables constantly.

-- start testing the deletes
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
-- trial 1 - subselect ~ 60 ms
BEGIN TRANSACTION
DELETE FROM dbo.del1
WHERE ident IN (SELECT ident2 FROM dbo.del2);
ROLLBACK transaction;
-- trial 2 - Exists ~ 1750 ms
BEGIN TRANSACTION
DELETE FROM dbo.del1
WHERE EXISTS (SELECT ident2 FROM dbo.del2);
ROLLBACK transaction;
-- trial 3 - Inner join ~70 ms
BEGIN TRANSACTION
DELETE D1
FROM del1 D1
INNER JOIN del2 ON d1.ident = del2.ident2;
ROLLBACK TRANSACTION;
COMMIT TRANSACTION;
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;

Running the above I was getting the following results:

*Note – I have a multi-core laptop, so CPU time is collective of all threads & cores and may show higher than the actual execution time. I did not show parse or compile times since they were both zero

Trial 1: Sub-select

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.
Table 'DEL1'. Scan count 1, logical reads 256, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'del2'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47 ms,  elapsed time = 72 ms.

Trial 2: Exists

Table 'del2'. Scan count 4, logical reads 50000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DEL1'. Scan count 5, logical reads 100769, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
CPU time = 1763 ms,  elapsed time = 575 ms.

Trial 3: Inner join

Table 'DEL1'. Scan count 1, logical reads 256, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'del2'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
CPU time = 78 ms,  elapsed time = 71 ms.

With the way the SQL Engine tunes things I was thinking that the inner join could be quicker, but in this example it was running in about the same time as the sub-select statement. but was  taking more CPU to perform. Due to that simple little bit the sub-select was more efficient at doing what it had to do, even if it took the same amount of time to complete. By using less CPU resources  & time the system has more resources to use for other transactions and tasks.

Just goes to show how a simple little test while showing the statistics from the execution of the statements can reveal what may seem like just as efficient a process in time spent really is taking more resource. SQL isn’t just all about the time it takes to finish something, but it is about the sharing of resources as well.

After having run the tests I cleaned up the tables from my test DB

DROP TABLE dbo.del1;
drop table dbo.del2;

Hopefully this will help to inspire you to run some tests of your own when you have multiple ways you can perform a task in SQL.

The post 10-minute SQL: Filtered Delete: join vs. sub-select vs. exist appeared first on WaterOx Consulting, Inc.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating