Extreme long UPDATE action, advice needed.

Question

Extreme long UPDATE action, advice needed.

FreeHansje

SSChampion

Points: 11751
More actions
October 17, 2008 at 6:31 am

#72082

Consider this statement:
update hl7_patienten
set woonplaats = woonplaatsnaam
--SELECT t1.patientnummer, woonplaats, woonplaatsnaam
from hl7_patienten t1
inner join ptr_init_patienten_z_woonp t2
on t1.patientnummer = t2.patientnummer
Table hl7_patienten has 1500000 records, no index on WOONPLAATS, table ptr_init_patienten_z_woonpl has 13000 records, no index on woonplaatsnaam. The related records in hl&-patienten have value NULL in woonplaats. Both tables have patientnummer as clustered primary key.
I have tried placing indices here, but it makes no difference, the update takes about an hour to run, which seems excessive to me. If I do a SELECT I have the result in a few seconds. When I look with SHOWPLAN I see the proces starts with a clustered index seek on both tables.
Any hints?
Greetz,
Hans Brouwer

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Jack Corbett SSC Guru Points: 184393 More actions · Answer 1

I don't think adding indexes to the updated column or the column used for the update will help. In fact adding an index to the updated column will likely slow down the update as it will cause the data in the index to be re-arranged. You should look at locking and blocking. What are the number of rows actually being updated?

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

It's a long shot, but is there any difference if you use the alias in the UPDATE? I know it can, under some circumstances, make a substantial difference to performance.

update t1 set woonplaats = t2.woonplaatsnaam

--SELECT t1.patientnummer, woonplaats, woonplaatsnaam

from hl7_patienten t1

inner join ptr_init_patienten_z_woonp t2

on t1.patientnummer = t2.patientnummer

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

FreeHansje (10/17/2008)
Consider this statement:
update hl7_patienten
set woonplaats = woonplaatsnaam
--SELECT t1.patientnummer, woonplaats, woonplaatsnaam
from hl7_patienten t1
inner join ptr_init_patienten_z_woonp t2
on t1.patientnummer = t2.patientnummer
Table hl7_patienten has 1500000 records, no index on WOONPLAATS, table ptr_init_patienten_z_woonpl has 13000 records, no index on woonplaatsnaam. The related records in hl&-patienten have value NULL in woonplaats. Both tables have patientnummer as clustered primary key.
I have tried placing indices here, but it makes no difference, the update takes about an hour to run, which seems excessive to me. If I do a SELECT I have the result in a few seconds. When I look with SHOWPLAN I see the proces starts with a clustered index seek on both tables.
Any hints?

How many records in hl7_patienten have a null value in woonplaats?

As this is posted in a SQL Server 7/2000 forum, I'd give this a try:

set rowcount 5000 -- modify this to increase the batch size if needed

while exists(select 1 from hl7_patienten where woonplaats is null)

begin

update hl7_patienten set

woonplaats = t2.woonplaatsnaam

--SELECT t1.patientnummer, woonplaats, woonplaatsnaam

from

hl7_patienten t1

inner join ptr_init_patienten_z_woonp t2

on (t1.patientnummer = t2.patientnummer)

where

woonplaats is null

end

set rowcount 0

😎

FreeHansje SSChampion Points: 11751 More actions · Answer 4

Tnx for answering all.

I will try the suggestions given here.

The number of records having woonplaats IS NULL is exactly the number of records in the table T2: almost 13000. I found out, that adding

AND WHERE woonplaats IS NULL

increases performance tremendously: from 60 minutes to 22 minutes! Still a ridiculous time, but definitely an improvement.

Tnx again.

Greetz,
Hans Brouwer

Garadin One Orange Chip Points: 29613 More actions · Answer 5

t1.patientnummer = t2.patientnummer

Is this a 1 to 1 relationship? It's hard to imagine that an update like this could take so long with a proper join.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 6

Heh... not difficult to imagine at all if there are a dozen indexes on the table and no maintenance has been done either on the indexes or the statistics. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 7

FreeHansje (10/18/2008)
Tnx for answering all.
I will try the suggestions given here.
The number of records having woonplaats IS NULL is exactly the number of records in the table T2: almost 13000. I found out, that adding
AND WHERE woonplaats IS NULL
increases performance tremendously: from 60 minutes to 22 minutes! Still a ridiculous time, but definitely an improvement.
Tnx again.

When you actually run that statement, how many rows does it tell you got updated after you added the criteria? And, while the update is running, is there anything else happening to the table from somewhere else?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

FreeHansje SSChampion Points: 11751 More actions · Answer 8

Again,

Tnx for answering all.

There is a 1-on-1 relation between the 2 tables. There are 2 indices, 1 of them the PrimKey on t1, only 1 PrimKey on t2. Statistics are not bad, although there is no regular maintenance on this database, since we are working in a test environment. That means no1 else is busy with this database, something I checked when running this statement.

I'm not sure how many actual updates have been done, since I get an update result for each individual update; a guestimate makes me believe there are about 13000 updates, the right/expected amount.

However, since this is a 1-time process 22 minutes is acceptable, but it vexes me WHY it takes so long. The CLUSTERD INDEX SCAN in SHOWPLAN on table t1 tells me it goes through all the 1.5 mil records in that table; I would expect a much lower number.

Greetz,
Hans Brouwer

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

For S & G, put an index on woonplaats. If there are 1.5 million rows, but only 13,000 rows with a null value in woonplaats, you might see a marked improvement in the update query.

😎

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 10

FreeHansje (10/20/2008)
I'm not sure how many actual updates have been done, since I get an update result for each individual update; a guestimate makes me believe there are about 13000 updates, the right/expected amount.

I just want to know what the rowcount returns as when you run you're update code... you're not suggesting that you run one update statement per record are you???

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

FreeHansje SSChampion Points: 11751 More actions · Answer 11

Something funny is going on, I have to go deeper into this.

Jeff, the statement is as given in my first post. I get about 13000 1-row-effected messages in my resultpane.

However, I'm beginning to think I have a paralellism problem here. My initial testing was on a server with 2 processors. The final run was on a server with 1 processor, and here it ran in seconds...

Strangely tho SHOWPLAN mentioned nothing about paralellism.

I have to do more research on this, tnx for answering all.

Greetz,
Hans Brouwer

Age2uN SSC Enthusiast Points: 110 More actions · Answer 12

Sounds to me as if there is a trigger on the updated table firing some other action per updated row.

This might as well slow down everything trmendously.

devloping robust and performant databaseapplications with Microsoft SQL-Server

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 13

FreeHansje (10/21/2008)
Something funny is going on, I have to go deeper into this.
Jeff, the statement is as given in my first post. I get about 13000 1-row-effected messages in my resultpane.
However, I'm beginning to think I have a paralellism problem here. My initial testing was on a server with 2 processors. The final run was on a server with 1 processor, and here it ran in seconds...
Strangely tho SHOWPLAN mentioned nothing about paralellism.
I have to do more research on this, tnx for answering all.

Parallelism could be the problem... one way to find out is to use OPTION(MAXDOP 1).

The reason I keep asking about the rowcount is that I wanted to make sure that you weren't experiencing some form of runaway code like what happened to me during the not so famous IS NULL bug and I wanted to make sure you didn't have any triggers happening. Also, as Horst just mentioned, if you get more than just one row count, there's a trigger in play and it needs to be repaired for performance.

Of course, there could be about a zillion other differences between the two boxes you tested on both in disk/hardware, operating system preferences, and the SQL Server setup.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)