March 27, 2008 at 10:57 am
Hi everybody
I read some posts about removing lots of rows from a table.
I am facing a similar problem. I have a star modeling.
I have a table with 5 million rows but in order to keep good performance, we decided to clean our DB, by removing rows older than a date parameter. But deleting a row in the main table involves to delete a row in 40 linked tables.
We tried to delete about 1,5 million of rows in the main table, then 40 * 1,5 = 60 millions rows. :w00t:
We tried on our emergency server, that is the same configuration than the live DB. It takes too much time.
It deletes about 15 000 rows in the main table by hour.
At the beginning, we wrote a proc to select the IDs from the main table to delete by a cursor.
For each ID, the proc calls another proc with the ID as parameter. In this second proc, we delete the rows in the 40 child tables.
Our DBA said that SQL Server rebuild its index on every table and this is the reason for the long duration of this statement. I asked him if it is not possible to disable this 'automatic' rebuild and for only answer, I got that the deleting is made on a clustered primary key and it is not possible. :blink:
I tried to find another solution :
1 - I create a table with all Id to delete and loop on this one instead of on the statement (joins of 3 tables (main and child tables).
2 - I noticed that all the constraint are defined with a cascade delete, so deleting in the main table will automatically delete in the child tables. No need to execute delete statement on each child table. But it did not speed up the massive deleting.
3 - I tried to disable all the constraint in order to avoid to check foreign key constraint. It was much more faster but not enough.
Now I am looking for a way to speed up this batch.
Does anyone have a 'real DBA' solution or a way to investigate ?
We are ready to delete by smaller packages (100 000 rows) but even this number involves a 6 hours working time (and during the night, there are others batches that are running).
Thanks for your answers !
March 27, 2008 at 11:49 am
SQL does not automatically rebuild indexes are large deletes it just leaves the spaces marked as deleted (the data is actually still in the database until it is overwritten). Check the database to make sure the auto shrink option is not on (this might cause data movement and the indexes then might reorganize if it is). As well how long does it take to delete 1 record?
March 27, 2008 at 1:47 pm
Have you tried deleting from each of the individual child tables first? How fast did that run?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 27, 2008 at 2:04 pm
If there are indexes other than the primary key on any of the tables, you might try disabling those indexes before the delete and then rebuilding them afterwards. Books Online has an example of how to do this.
Even deleting a very large number of records shouldn't take as long as you are describing, unless there are other locks taking place on the table(s) at the same time. (Or if the hardware is simply underpowered for a database that size.)
Try running a smaller delete, maybe 100 rows, and run a trace to see what locks you might be running into. Books Online has data on how to do this.
I would expect it to be locking issues if I ran into something like this. If that's the case, possibly deleting in very small batches will help, but you'll have to resolve any locks first.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 28, 2008 at 4:06 am
Antares686 (3/27/2008)
SQL does not automatically rebuild indexes are large deletes it just leaves the spaces marked as deleted (the data is actually still in the database until it is overwritten). Check the database to make sure the auto shrink option is not on (this might cause data movement and the indexes then might reorganize if it is). As well how long does it take to delete 1 record?
Hi,
I thought so about automatic indexes rebuild.
I checked that this option (AutoShrink) is disabled.
I change my SQL statement to get the duration of the deletion of one group of record (one ID from main table and one row in each of the 40 child tables)
March 28, 2008 at 6:08 am
Here's a couple of facts for you...
Creating a temp table that holds 750000 rows of primary keys of rows to be deleted takes 2.1 seconds using SELECT/INTO.
Adding a primary key to that temp table takes 1.4 seconds
A set based delete on the source table joined to that temp table that holds all the primary keys operates at a rate of 35,000 rows per second. This number includes the time it takes to update the primary key and, of course, will get smaller in the presence of additional indexes, triggers, and other things.
Cascade delete slows it down because that basically turns the setbased delete into RBAR.
Cursors cripple it. Most delete triggers will make it run twice as slow no matter which method is used.
Deletes over a second long (25-35,000 rows) should be broken down into sets of 25,000 rows to keep the transaction log and blocking from going absolutely nuts... I normally do this with a "delete crawler". To keep the process from being horribly blocked by other processes, use TABLOCKX to do the deletes... 25,000 rows will take something less than a second and, with a delay in between deletes, most apps can handle that with no problem.
Here's an example of the delete crawler that I use in such situations... the example includes a test table you can play with. As usual, the details are in the comments in the code...
--===== If the test table exists, drop it
IF OBJECT_ID('dbo.JBMTestDetail','U') IS NOT NULL
DROP TABLE dbo.JBMTestDetail
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "ConnID" has a range of 1 to 100,000 non-unique numbers stored as VARCHAR(30)
-- Column "Key1" has a range of 1 to 100,000 non-unique numbers stored as VARCHAR(30)
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "Time_Stamp" has a range of >=01/01/2005 and <01/01/2015 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
SELECT TOP 1000000
ConnID = ISNULL(CAST(ABS(CHECKSUM(NEWID()))%100000+1 AS VARCHAR(30)),''), --(10 rows per connection)
Key1 = ISNULL(CAST(ABS(CHECKSUM(NEWID()))%100000+1 AS VARCHAR(20)),''), --just to test index with
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
Time_Stamp = ISNULL(CAST(RAND(CHECKSUM(NEWID()))*3652.0+38351.0 AS DATETIME),0),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTestDetail
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
--===== Create indexes similar to Troy's
CREATE CLUSTERED INDEX IXC_JBMTestDetail_Time_Stamp_ConnID ON dbo.JBMTestDetail (Time_Stamp,ConnID)
CREATE NONCLUSTERED INDEX IX_JBMTestDetail_ConnID_Key1 ON dbo.JBMTestDetail (ConnID,Key1)
GO
--===== Setup to measure performance...
SET STATISTICS TIME ON
--===== Define the cutoff date with a time of "midnight" or, if you will,
-- define the cutoff date with no time so we only delete whole days.
DECLARE @CutoffDate DATETIME
SELECT @CutoffDate = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),-94)
--===== Limit all further queries, including deletes, to 25,000 rows
-- (about 1 second worth of deletes, like I said before)
SET ROWCOUNT 25000
--===== See if any rows qualify for deletion. If even just one exists,
-- then there's work to do and @@ROWCOUNT will be > 0.
-- DO NOT PUT ANY CODE BETWEEN THIS SELECT AND THE WHILE LOOP OR
-- YOU'LL MESS UP THE ROWCOUNT FOR THE WHILE LOOP
SELECT TOP 1 1 FROM dbo.JBMTestDetail WHERE Time_Stamp < @CutoffDate
--===== If the rowcount from the above is greater than 0,
-- then delete 25,000 rows at a time until there's nothing
-- left to delete
WHILE @@ROWCOUNT > 0
BEGIN
--===== Just a "marker" to separate the loop in the output
PRINT REPLICATE('=',78)
--===== This delay gives other processes breathing room
WAITFOR DELAY '00:00:10'
--===== Do the delete. Will be limited by the SET ROWCOUNT above.
-- DO NOT PUT ANY CODE BETWEEN THIS DELETE AND THE "END" OR
-- YOU'LL MESS UP THE ROWCOUNT FOR THE WHILE LOOP.
DELETE dbo.JBMTestDetail WITH (TABLOCKX)
WHERE Time_Stamp < @CutoffDate
END
--===== Restore the ability to process more than 25,000 rows
SET ROWCOUNT 0
SELECT 1000000 - COUNT(*) FROM jbmtestdetail
--Jeff Moden
Change is inevitable... Change for the better is not.
March 31, 2008 at 2:05 am
Hi,
thanks for your reply.
I am just trying a solution (copying tables with rows we want to keep into working tables, truncating tables then reinserting rows into tables in order to avoid breaking "links" with stored procs and other db objects).
My first attempts show that it is faster than directly deleting rows.
I have to give a first script at the end of today (thanks to my manager) and I will make more attempts tomorrow.
After, I will give you results of my searchs.
See you later.
March 31, 2008 at 5:40 am
Most people crab about the duplicate disk space while that happens. If you don't mind that, then yes, the method you spoke of will be faster.
The problem is, the data is not available to the users between the truncate and the completion of the insert. But, again, if that small duration that occurs using the method you spoke of will be faster.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 31, 2008 at 10:54 am
Also, if there are foreign keys that reference the table you are truncating, you'll have to remove those first, then rebuild them. Not a big problem, but needs to be noted.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 2, 2008 at 10:09 am
Hi, everybody.
I write down about the results of my problem and the different attempts to resolve it.
First, I thank you for your answers.
So, we decided to process as I described in my previous post.
I had to be write a sql statement, as automatic as possible.
So here what I implemented.
- I create a table with the names of the table I had to purge, and another one to insert datas about durations of copying data, drop constraints, truncate, re-insert from working tables, add constraints ...
- I create 1 function, 2 stored procedures :
* one that loops onto the table with names of the tables to purge and calls the second
* one that makes the processes (copy, drop constraints, truncate, reinsert, add constraints).
Our DBA prepared the emergency DB Server with a restoration of the live prod DB (excepting the fact that our DBA did not use backup/restore operations but Embarcadero, to create db objects before massive inserts by bcp and he forgot tables (including the main table of our star model), lost some identity constraints). So, my statement did not run to the end
But, to give a beginning of measurements :
I tried my statement on a local computer with 70 000 rows to delete from 300 000 rows and this in 40
tables. It runs during less 10 minutes.
Our live DB is 75Gb large. Each table of the model has more than 5 million rows.
On some tables it takes about 2-3 minutes to run.
So we expect, when we will be able to test it as we would be on the live DB, a max duration of 2 hours.
The operations that could be long are re-create contraints on all the tables that I had to drop before truncating them (as Gsquared said).
If somebody wants that I post some of the statements I wrote, I could do it.
I know that deleting rows is logged while truncate table not.
But I find amazing that copying, truncate, and re-insert the 2 thirds of a table is faster than deleting one third of it.
I want to thank you for advice and information.
Have a nice day !
And the criteria
April 2, 2008 at 2:36 pm
It is amazing that two copies and truncate could be so fast. Guess that's why lot so folks use it... so long as there aren't foreign keys, delete triggers, etc, etc.
Thanks for sharing what you're trying to do...
--Jeff Moden
Change is inevitable... Change for the better is not.
April 3, 2008 at 2:24 am
Hello,
I will ask a friend who knows quietly good Oracle if he has already faced this problem and if Oracle DB give the same results.
I will post in the day the sql statements (I will comment them).
THX
April 3, 2008 at 3:52 am
Regarding FK:
- Many people forget to create supporting indexes for their foreign keys (so in the child tables ! ) ! Resulting in full table scans (every time and again !!!)
- Keep in mind to create the FK-index fully matching the FK-definition (column sequence)
- For large clean up operations, plan application down time !! and restrict all access during your operation ! (physical or by using e.g. table-lock)
That's the only way you can guarantee that data is still consistent during your operation.
- If using drop/disable FK, Keep in mind to create/enable FK using the WITH CHECK option !!! Don't use with NOcheck because you will get dirty data in the long run
Maybe these scripts can get you on track ...
- Detect invalid FK-data after using "WITH NOCHECK" : http://www.sqlservercentral.com/scripts/Miscellaneous/31808/
- Generate Create FK-indexes:
http://www.sqlservercentral.com/scripts/Indexing/61391/
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
April 3, 2008 at 9:08 am
ALZDBA (4/3/2008)
Regarding FK:- Many people forget to create supporting indexes for their foreign keys (so in the child tables ! ) ! Resulting in full table scans (every time and again !!!)
- Keep in mind to create the FK-index fully matching the FK-definition (column sequence)
- For large clean up operations, plan application down time !! and restrict all access during your operation ! (physical or by using e.g. table-lock)
That's the only way you can guarantee that data is still consistent during your operation.
- If using drop/disable FK, Keep in mind to create/enable FK using the WITH CHECK option !!! Don't use with NOcheck because you will get dirty data in the long run
Maybe these scripts can get you on track ...
- Detect invalid FK-data after using "WITH NOCHECK" : http://www.sqlservercentral.com/scripts/Miscellaneous/31808/
- Generate Create FK-indexes:
Hi ALZDBA,
I don't this problem because of the star modeling. The child tables reference the main one by an ID that is
the field PK, so a clustered index is created on each child table, and an index too.
And, when I re-create my foreign keys, I use the WITH CHECK option (default option)
Thanks for the links.
Viewing 14 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply