July 5, 2010 at 5:15 pm
I am working on a solution to purge data older than a specific time frame. I'm looking to see if there's a high performance solution to this problem.
This is not the actual data and tables I'm using, but it is the same basic concept.
Here's the sample schema & data
CREATE TABLE Orders
(
OrderID int NOT NULL,
CustID int NOT NULL,
OrderDate datetime NOT NULL
);
CREATE TABLE Customer
(
CustID int NOT NULL,
CustName varchar(100) NOT NULL
);
INSERT INTO Customer(CustID, CustName) VALUES (1,'Customer1');
INSERT INTO Customer(CustID, CustName) VALUES (2,'Customer2');
INSERT INTO Customer(CustID, CustName) VALUES (3,'Customer3');
INSERT INTO Customer(CustID, CustName) VALUES (4,'Customer4');
INSERT INTO Orders (OrderID, CustID, OrderDate) VALUES (1, 1, '1/1/2000');
INSERT INTO Orders (OrderID, CustID, OrderDate) VALUES (2, 2, '1/1/2000');
INSERT INTO Orders (OrderID, CustID, OrderDate) VALUES (3, 3, '1/1/2007');
INSERT INTO Orders (OrderID, CustID, OrderDate) VALUES (4, 4, '1/1/2008');
INSERT INTO Orders (OrderID, CustID, OrderDate) VALUES (5, 1, '1/1/2010');
The scenario would be to delete orders older than 2005 and to also delete customers if they hadn't placed any orders newer than 2005.
So for this sample data, I would want to delete Order #'s 1 and 2 as well as Customer # 2, but not Customer # 1 (since they had placed a newer order)
Deleting the orders is easy and isn't presented here. Deleting the customers efficiently is what I'm concerned about.
Both real tables have millions of rows, so the solution has to scale.
Here's what I have so far:
DECLARE @purged TABLE (OrderID int NOT NULL)
-- there will be a query that populates this table, but this is just for the example
INSERT INTO @purged (OrderID) VALUES (1)
INSERT INTO @purged (OrderID) VALUES (2)
DELETE c
FROM Customer c
WHERE EXISTS(
SELECT *
FROM Orders o1
INNER JOIN @purged p1
ON o1.OrderID = p1.OrderID
AND c.CustID = o1.CustID
)
AND NOT EXISTS(
SELECT *
FROM Orders o2
LEFT OUTER JOIN @purged p2
ON o2.OrderID = p2.OrderID
WHERE p2.OrderID IS NULL
AND o2.CustID = c.CustID
)
This works, but I'm wondering if I'm missing something that would make this more efficient.
July 5, 2010 at 11:36 pm
What version of SQL, what edition?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 6, 2010 at 12:27 am
SQL 2008 Enterprise
July 6, 2010 at 1:41 am
Then for the orders at least, consider table partitioning. Partition on order date (or similar), decide on a suitable partition function, maybe one partition per year, then you can delete old orders as a metadata operation, similar to a drop table. It really doesn't get any more efficient than that.
Customers are a fair bit harder though. Perhaps a batched delete WHERE NOT EXISTS (SELECT 1 FROM Orders WHERE Orders.CustomerID = Customers.CustomerID) (once you've deleted the orders). Delete in chunks of a few thousand rows until the delete deletes no more rows. Often quicker than doing a single delete, less harsh on log and locks.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply