T-SQL Code Review: Looping Record Deletion w. Date Validation

  • Hi there, I put this script together based off of some code I found in a few other threads on this forum after doing a search. This script is for a development environment, but after it is thoroughly tested and vetted it may be deployed to a prod environment. I am simpilying pruning old records from some staging tables. Any feedback or words of wisdom would be wonderful. Thanks!

    DECLARE @DeleteDate DATETIME ,

    @RowsToDelete BIGINT

    -- Calculate the Min. Date plus 1 which will be the first range of dates to delete

    SET @DeleteDate = ( SELECT DATEADD(DAY, 1, MIN([cslpnle].[DateCreated])) AS [DeleteDate]

    FROM [dbo].[RawVehicle] (NOLOCK) AS cslpnle

    )

    -- Delete rows while data older than 5 days

    PRINT @DeleteDate

    WHILE DATEDIFF(DAY, @DeleteDate, GETUTCDATE()) > 1

    BEGIN

    SET @RowsToDelete = 1

    --Purge RawVehicles

    WHILE @RowsToDelete > 0

    BEGIN

    DELETE TOP ( 200 )

    [dbo].[RawVehicleOption]

    FROM [dbo].[RawVehicleOption]

    INNER JOIN [dbo].[RawVehicle] AS rv ON [dbo].[RawVehicleOption].[RawVehicleId] = [rv].[Id]

    WHERE [rv].[DateCreated] <= @DeleteDate

    DELETE TOP ( 200 )

    [dbo].[RawInventoryPhoto]

    FROM [dbo].[RawInventoryPhoto] AS rip

    INNER JOIN [dbo].[RawInventoryVehicle] AS riv ON rip.[RawInventoryVehicleId] = [riv].[Id]

    INNER JOIN [dbo].[RawVehicle] AS rv ON [riv].[RawVehicleId] = [rv].[Id]

    WHERE [rv].[DateCreated] <= @DeleteDate

    DELETE TOP ( 200 )

    [dbo].[RawDealerLotToInventoryVehicle]

    FROM [dbo].[RawDealerLotToInventoryVehicle] AS rdltiv

    INNER JOIN [dbo].[RawInventoryVehicle] AS riv ON [rdltiv].[RawInventoryVehicleId] = [riv].[Id]

    INNER JOIN [dbo].[RawVehicle] AS rv ON [riv].[RawVehicleId] = [rv].[Id]

    WHERE [rv].[DateCreated] <= @DeleteDate

    DELETE TOP ( 200 )

    [dbo].[RawInventoryVehicle]

    FROM [dbo].[RawInventoryVehicle] AS riv

    INNER JOIN [dbo].[RawVehicle] AS rv ON [riv].[RawVehicleId] = [rv].[Id]

    WHERE [rv].[DateCreated] <= @DeleteDate

    DELETE TOP ( 200 )

    FROM [dbo].[RawVehicle]

    WHERE [DateCreated] <= @DeleteDate

    SET @RowsToDelete = @@ROWCOUNT

    END

    SET @DeleteDate = DATEADD(DAY, 1, @DeleteDate)

    END

  • If you are wanting to keep data for five days, you may want to revisit your date test. Looks to me that you will only keep data from today.

  • Does RawVehicle have a one to one relationship to the other tables?

    If if it is one to many, you will leave orphans in the other tables.

    One way to do this is to select 200 rows at a time into a temp table.

    Then you can join that set to all of the tables (including RawVehicle) when you do the delete.

    This way you don't have to worry about one to may relationships.

    I am not sure that top (200) will always delete records belonging to the same set of RawVehicle.Id,

    unless you specify some kind of order by or really understand how the indexes are traversed and thus the order of the records being deleted.

  • Lynn Pettis (3/11/2013)


    If you are wanting to keep data for five days, you may want to revisit your date test. Looks to me that you will only keep data from today.

    Hi Lynn, sorry that comment doesn't really mesh with my variable value now does it? The idea is that I will be able to pass in a parameter to the job based on requirements from those picky business people. 🙂

  • arnipetursson (3/11/2013)


    Does RawVehicle have a one to one relationship to the other tables?

    If if it is one to many, you will leave orphans in the other tables.

    One way to do this is to select 200 rows at a time into a temp table.

    Then you can join that set to all of the tables (including RawVehicle) when you do the delete.

    This way you don't have to worry about one to may relationships.

    I am not sure that top (200) will always delete records belonging to the same set of RawVehicle.Id,

    unless you specify some kind of order by or really understand how the indexes are traversed and thus the order of the records being deleted.

    Vehicle and InventoryVehicle are one-to-one, but the remaining three tables have one-to-many relationships (many photos per car, may options per car, and a vehicle may belong to many lots at once (yes in our case one physical vehicle can be on more than one lot at once 🙂 Fancy huh?

    Anyhow, since all tables are being joined back to Vehicle and the dateadded is always being validated using the vehicle table, I'm trying to understand how I would orphan records (albeit it would be impossible to orphan anything but a vehicle record due to FKs).

    Can you clarify a bit?

  • If 200 RawVehicle record have 400 child records in one of the other tables,

    you will delete all 200 RawVehicle, but only 200 of the 400 child records.

  • arnipetursson (3/11/2013)


    If 200 RawVehicle record have 400 child records in one of the other tables,

    you will delete all 200 RawVehicle, but only 200 of the 400 child records.

    got it, and since I am calculating rowstodelete based off of records in vehicle, then those child records will not be deleted because the loop will exit. I don't think the ORDER BY option will work for my purposes due trying to keep this as efficient as possible. Ill look into the options for the temp table. What if I looped through the delete for each table individually. Basically do 5 loops insteads of 1.

  • That would work

  • What are your thoughts on something more like this was individual loops for each table to be deleted from:

    DECLARE @DeleteDate DATETIME ,

    @RowsToDelete BIGINT

    -- Calculate the Min. Date plus 1 which will be the first range of dates to delete

    SET @DeleteDate = ( SELECT DATEADD(DAY, 1, MIN([cslpnle].[DateCreated])) AS [DeleteDate]

    FROM [dbo].[RawVehicle] (NOLOCK) AS cslpnle

    )

    -- Delete rows while data older than 5 days

    PRINT @DeleteDate

    WHILE DATEDIFF(DAY, @DeleteDate, GETUTCDATE()) > 1

    BEGIN

    SET @RowsToDelete = 1

    --Purge RawVehicles

    WHILE @RowsToDelete > 0

    BEGIN

    DELETE TOP ( 200 )

    [dbo].[RawVehicleOption]

    FROM [dbo].[RawVehicleOption]

    INNER JOIN [dbo].[RawVehicle] AS rv ON [dbo].[RawVehicleOption].[RawVehicleId] = [rv].[Id]

    WHERE [rv].[DateCreated] <= @DeleteDate

    SET @RowsToDelete = @@ROWCOUNT

    END

    SET @RowsToDelete = 1

    --Purge RawVehicles

    WHILE @RowsToDelete > 0

    BEGIN

    DELETE TOP ( 200 )

    [dbo].[RawInventoryPhoto]

    FROM [dbo].[RawInventoryPhoto] AS rip

    INNER JOIN [dbo].[RawInventoryVehicle] AS riv ON rip.[RawInventoryVehicleId] = [riv].[Id]

    INNER JOIN [dbo].[RawVehicle] AS rv ON [riv].[RawVehicleId] = [rv].[Id]

    WHERE [rv].[DateCreated] <= @DeleteDate

    SET @RowsToDelete = @@ROWCOUNT

    END

    SET @RowsToDelete = 1

    --Purge RawVehicles

    WHILE @RowsToDelete > 0

    BEGIN

    DELETE TOP ( 200 )

    [dbo].[RawDealerLotToInventoryVehicle]

    FROM [dbo].[RawDealerLotToInventoryVehicle] AS rdltiv

    INNER JOIN [dbo].[RawInventoryVehicle] AS riv ON [rdltiv].[RawInventoryVehicleId] = [riv].[Id]

    INNER JOIN [dbo].[RawVehicle] AS rv ON [riv].[RawVehicleId] = [rv].[Id]

    WHERE [rv].[DateCreated] <= @DeleteDate

    SET @RowsToDelete = @@ROWCOUNT

    END

    SET @RowsToDelete = 1

    --Purge RawVehicles

    WHILE @RowsToDelete > 0

    BEGIN

    DELETE TOP ( 200 )

    [dbo].[RawInventoryVehicle]

    FROM [dbo].[RawInventoryVehicle] AS riv

    INNER JOIN [dbo].[RawVehicle] AS rv ON [riv].[RawVehicleId] = [rv].[Id]

    WHERE [rv].[DateCreated] <= @DeleteDate

    SET @RowsToDelete = @@ROWCOUNT

    END

    SET @RowsToDelete = 1

    --Purge RawVehicles

    WHILE @RowsToDelete > 0

    BEGIN

    DELETE TOP ( 200 )

    FROM [dbo].[RawVehicle]

    WHERE [DateCreated] <= @DeleteDate

    SET @RowsToDelete = @@ROWCOUNT

    END

    SET @DeleteDate = DATEADD(DAY, 1, @DeleteDate)

    END

  • Is there some reason no one has suggested a 'on delete cascade' foreign key solution? Am I missing something about this situation? It seems to me you could put a cascade on the foreign keys, delete the parent objects, and then take the cascade off again and be done with it.

  • That would work as well.

    I normally do not take that approach for two reasons.

    1. I try to avoid changing the database objects if can.

    Especially if it is a live application.

    2. Minimize locking and smaller transactions.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply