August 24, 2005 at 12:55 pm
hey all...
ok, table structure...
Id int, ContactId bigint, FieldValue varchar(4000), TypeId int, CurrentValue char(1)
there are currently over 170 million rows in this table... when I load new users in, their field values are separated to rows and inserted here (it adds up pretty quickly  ).  I need to be able to go through and update CurrentValue to 'N' for older values (non-current values are then removed from the table and inserted into a history table).
).  I need to be able to go through and update CurrentValue to 'N' for older values (non-current values are then removed from the table and inserted into a history table).
My current statement is:
Update cd
set CurrentValue = 'N'
from ContactDetails cd
where Id not in (select max(id) from ContactDetails cd1 group by contactid, typeid)
this works, but I just had to stop it as it was running for over 36 hours... I need a faster way to do this.
I've tried
Update cd
set CurrentValue = 'N'
from ContactDetails cd
inner join ContactDetails cd1 on cd.ContactId = cd1.ContactId and cd.TypeId = cd1.TypeId
where cd.Id < cd1.Id
but this actually seems slower (for a test section of 450k rows, 12 seconds for the subselect and over 30 minutes for the inner join).
Anyone have any other ideas? it's driving me insane.
Thanks a ton!
Cheers
 
August 24, 2005 at 1:09 pm
Shooting in the dark here, but is it necessary to process the entire table, or when you "load new users" can you just run this update for those records that are relevant?
August 24, 2005 at 1:21 pm
I have actually tried this using a couple different methods, all slow down the insert rather significantly... inserts of 7500 users goes from 45 - 60 seconds to 15 - 20 minutes, not good when you're loading another million users (I wish it was!)
Cheers
 
August 24, 2005 at 1:23 pm
Is there a trigger on that table?
August 24, 2005 at 1:44 pm
nope, no triggers... tried that, was REALLY slow
there are a ton (7 - 8) indexes on it... but these should only speed it up.
Cheers
 
August 24, 2005 at 1:46 pm
Have you tried forcing the use of any of the Indexes?
I wasn't born stupid - I had to study.
August 24, 2005 at 1:55 pm
Have you tried doing the update in small batches (10 000 rows, wait 1 sec, next batch).
August 24, 2005 at 2:00 pm
do you have any syntax that I could see regarding this? I've never had to force the use of an index...
Cheers
 
August 24, 2005 at 2:03 pm
I wouldn't go with that route first. I'd first make sure that the plan is bad first and I doubt it's the case here.
August 24, 2005 at 2:03 pm
I'm guessing the syntax for this would be something like
declare @Id int
select @Id = min(Id) from ContactDetails
while @Id < (select max(Id) from ContactDetails)
begin
Update cd
set CurrentValue = 'N'
from ContactDetails cd
where Id not in (select max(id) from ContactDetails cd1 group by contactid, typeid)
and Id between @Id and @Id + 10000
set @Id = @Id + 10000
end
no, I haven't done this... seems like the loop would cause more overhead (although, it wouldn't take NEARLY as long to rollback stuff). Am I wrong about the loop adding overhead?
Cheers
 
August 24, 2005 at 2:13 pm
It adds a little overhead, but you get very little more locking on the tables, so your current tasks wouldn't be affected. So in essence that thing could run for days without anyone noticing it.
I have a few ideas for a better looping way, can you add a column to the tables (they all need that).
August 24, 2005 at 2:18 pm
Remi is right. Plus, it has been a long time since I had to force an index (I think it was 6.5). I will look, but it will take me a while to find it... (it is simple, but if I remember, hard to look in BOL).
I wasn't born stupid - I had to study.
August 24, 2005 at 2:19 pm
Colin - also note that indexes don't speed everything up -- they actually can slow inserts and updates, as the content in each index needs to be brought in line with changes. Removing indexes that are not used can actually speed up changes to tables.
August 24, 2005 at 2:28 pm
Remi: I CAN add columns, but that'll definately slow stuff down, no?
Farrell: if I remember, in 6.5 you had to explicitly call indexes into use, but 7.0 and 2000 do it automatically.
Merrill: I'm waiting for the 36 hour rollback to happen, then I've actually modded the proc to remove all indexes as step 1, then add them all back as a final step (I've read indexes can slow inserts/updates over the past couple days as well)
Cheers
 
August 24, 2005 at 2:37 pm
It would take time to do the add column, but it would speed up the loop process greatly >>
Add a new identity(1,1) column, INDEX IT
then something like this :
Declare @Loops as int
Declare @BatchSize as int
set @Loops = -1
set @BatchSize = 10000
While @@Rowcount > 0
begin
set @Loops = @Loops + 1
waitfor delay 00:00:01
update table set ... where NewColumn >= @Loops * @BatchSize and NewColumn < (@Loops + 1) * @BatchSize
end
I haven't tested this... try that on a small table to be sure it works. use the batch size that locks the ressource for the less time... and stills updates a good chuck of data.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply