Extreme long UPDATE action, advice needed.

  • Consider this statement:

    update hl7_patienten

    set woonplaats = woonplaatsnaam

    --SELECT t1.patientnummer, woonplaats, woonplaatsnaam

    from hl7_patienten t1

    inner join ptr_init_patienten_z_woonp t2

    on t1.patientnummer = t2.patientnummer

    Table hl7_patienten has 1500000 records, no index on WOONPLAATS, table ptr_init_patienten_z_woonpl has 13000 records, no index on woonplaatsnaam. The related records in hl&-patienten have value NULL in woonplaats. Both tables have patientnummer as clustered primary key.

    I have tried placing indices here, but it makes no difference, the update takes about an hour to run, which seems excessive to me. If I do a SELECT I have the result in a few seconds. When I look with SHOWPLAN I see the proces starts with a clustered index seek on both tables.

    Any hints?

    Greetz,
    Hans Brouwer

  • I don't think adding indexes to the updated column or the column used for the update will help. In fact adding an index to the updated column will likely slow down the update as it will cause the data in the index to be re-arranged. You should look at locking and blocking. What are the number of rows actually being updated?

  • It's a long shot, but is there any difference if you use the alias in the UPDATE? I know it can, under some circumstances, make a substantial difference to performance.

    update t1 set woonplaats = t2.woonplaatsnaam

    --SELECT t1.patientnummer, woonplaats, woonplaatsnaam

    from hl7_patienten t1

    inner join ptr_init_patienten_z_woonp t2

    on t1.patientnummer = t2.patientnummer

    Cheers

    ChrisM

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • FreeHansje (10/17/2008)


    Consider this statement:

    update hl7_patienten

    set woonplaats = woonplaatsnaam

    --SELECT t1.patientnummer, woonplaats, woonplaatsnaam

    from hl7_patienten t1

    inner join ptr_init_patienten_z_woonp t2

    on t1.patientnummer = t2.patientnummer

    Table hl7_patienten has 1500000 records, no index on WOONPLAATS, table ptr_init_patienten_z_woonpl has 13000 records, no index on woonplaatsnaam. The related records in hl&-patienten have value NULL in woonplaats. Both tables have patientnummer as clustered primary key.

    I have tried placing indices here, but it makes no difference, the update takes about an hour to run, which seems excessive to me. If I do a SELECT I have the result in a few seconds. When I look with SHOWPLAN I see the proces starts with a clustered index seek on both tables.

    Any hints?

    How many records in hl7_patienten have a null value in woonplaats?

    As this is posted in a SQL Server 7/2000 forum, I'd give this a try:

    set rowcount 5000 -- modify this to increase the batch size if needed

    while exists(select 1 from hl7_patienten where woonplaats is null)

    begin

    update hl7_patienten set

    woonplaats = t2.woonplaatsnaam

    --SELECT t1.patientnummer, woonplaats, woonplaatsnaam

    from

    hl7_patienten t1

    inner join ptr_init_patienten_z_woonp t2

    on (t1.patientnummer = t2.patientnummer)

    where

    woonplaats is null

    end

    set rowcount 0

    😎

  • Tnx for answering all.

    I will try the suggestions given here.

    The number of records having woonplaats IS NULL is exactly the number of records in the table T2: almost 13000. I found out, that adding

    AND WHERE woonplaats IS NULL

    increases performance tremendously: from 60 minutes to 22 minutes! Still a ridiculous time, but definitely an improvement.

    Tnx again.

    Greetz,
    Hans Brouwer

  • t1.patientnummer = t2.patientnummer

    Is this a 1 to 1 relationship? It's hard to imagine that an update like this could take so long with a proper join.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Heh... not difficult to imagine at all if there are a dozen indexes on the table and no maintenance has been done either on the indexes or the statistics. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • FreeHansje (10/18/2008)


    Tnx for answering all.

    I will try the suggestions given here.

    The number of records having woonplaats IS NULL is exactly the number of records in the table T2: almost 13000. I found out, that adding

    AND WHERE woonplaats IS NULL

    increases performance tremendously: from 60 minutes to 22 minutes! Still a ridiculous time, but definitely an improvement.

    Tnx again.

    When you actually run that statement, how many rows does it tell you got updated after you added the criteria? And, while the update is running, is there anything else happening to the table from somewhere else?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Again,

    Tnx for answering all.

    There is a 1-on-1 relation between the 2 tables. There are 2 indices, 1 of them the PrimKey on t1, only 1 PrimKey on t2. Statistics are not bad, although there is no regular maintenance on this database, since we are working in a test environment. That means no1 else is busy with this database, something I checked when running this statement.

    I'm not sure how many actual updates have been done, since I get an update result for each individual update; a guestimate makes me believe there are about 13000 updates, the right/expected amount.

    However, since this is a 1-time process 22 minutes is acceptable, but it vexes me WHY it takes so long. The CLUSTERD INDEX SCAN in SHOWPLAN on table t1 tells me it goes through all the 1.5 mil records in that table; I would expect a much lower number.

    Greetz,
    Hans Brouwer

  • For S & G, put an index on woonplaats. If there are 1.5 million rows, but only 13,000 rows with a null value in woonplaats, you might see a marked improvement in the update query.

    😎

  • FreeHansje (10/20/2008)


    I'm not sure how many actual updates have been done, since I get an update result for each individual update; a guestimate makes me believe there are about 13000 updates, the right/expected amount.

    I just want to know what the rowcount returns as when you run you're update code... you're not suggesting that you run one update statement per record are you???

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Something funny is going on, I have to go deeper into this.

    Jeff, the statement is as given in my first post. I get about 13000 1-row-effected messages in my resultpane.

    However, I'm beginning to think I have a paralellism problem here. My initial testing was on a server with 2 processors. The final run was on a server with 1 processor, and here it ran in seconds...

    Strangely tho SHOWPLAN mentioned nothing about paralellism.

    I have to do more research on this, tnx for answering all.

    Greetz,
    Hans Brouwer

  • Sounds to me as if there is a trigger on the updated table firing some other action per updated row.

    This might as well slow down everything trmendously.

    devloping robust and performant databaseapplications with Microsoft SQL-Server

  • FreeHansje (10/21/2008)


    Something funny is going on, I have to go deeper into this.

    Jeff, the statement is as given in my first post. I get about 13000 1-row-effected messages in my resultpane.

    However, I'm beginning to think I have a paralellism problem here. My initial testing was on a server with 2 processors. The final run was on a server with 1 processor, and here it ran in seconds...

    Strangely tho SHOWPLAN mentioned nothing about paralellism.

    I have to do more research on this, tnx for answering all.

    Parallelism could be the problem... one way to find out is to use OPTION(MAXDOP 1).

    The reason I keep asking about the rowcount is that I wanted to make sure that you weren't experiencing some form of runaway code like what happened to me during the not so famous IS NULL bug and I wanted to make sure you didn't have any triggers happening. Also, as Horst just mentioned, if you get more than just one row count, there's a trigger in play and it needs to be repaired for performance.

    Of course, there could be about a zillion other differences between the two boxes you tested on both in disk/hardware, operating system preferences, and the SQL Server setup.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply