As I am relatively new to SQL and still puzzling over some of the mysteries it offers up at times, I wonder if someone could explain in simple terms why I get the performance differential with a DELETE using different indexing strategies on both tables concerned.
Now I know that indexing tables can have an enormous effect on the performance of all sorts of SQL operations, but it's the following scenario that confuses me somewhat.
here's what I'm doing:
I have two tables, one called TempSales2 and the other called TempSales. Both tables have a column of type INT called RecID and I'm using this to DELETE rows from TempSales2 where the same RecID value does NOT exist in TempSales.
To achieve this, the following is the statement I am using:
DELETE FROM TempSales2
WHERE NOT EXISTS (SELECT 1 FROM TempSales
WHERE RecID = dbo.TempSales2.RecID);Now, originally I had only one index for either table and that was a UNIQUE index on the RecID column of the TempSales table. My reason for this was because I believed that the sub-query would greatly benefit if the RecID column value being passed from the DELETE statement from the TempSales2 table could be located much faster if the table being searched (I.e. TempSales) were indexed on its RecID column. My understanding is therefore that the DELETE statement is reading through an unordered table and executing the sub-query each time it reads a row and then the ReciD value is then compared within the sub-query. So, it's the sub-query that has to be fast.
In addition, I figured that as the table being deleted from (I.e. TempSales2) was not searched or ordered AND that having an index would slow deletion, that this table should remain unindexed.
The tables being searched/deleted had approximately 36 million rows each. The scenario above took some 6 hours and 2 minutes to run. I had deliberately engineered the data so that all rows in the TempSales2 table should be deleted - and they were.
A former colleague who was our resident SQL expert at the time suggested that I should also index the RecID column on the table being deleted from as this would speed up things considerably. I therefore reset all data and tried again this time adding a UNIQUE index to the suggested column.
The result? Well, this time the task completed in just over 10 minutes! A massive improvement in performance, but one I cannot understand given my novice status.
If anyone can help me with this - please in simple terms - I would much appreciate it.
People often tell me that it's not necessary to understand why, only to "do". But I'm somewhat anal in this regard and like to understand why things work the way they do.
Also, if there's a better way to perform this DELETE, then by all means let me know. My code is only the way I know how to achieve what I need, so please feel free to show me something better.