Data Warehouse Update to Live Table

  • We have a small data warehouse that updates hourly. I've been having some issues lately where the process that updates the table is intermittently stalling out and never finishing. I've narrowed it down to the step that deletes updated records, just before reinserting the new ones. Under normal circumstances, this part of the process takes less than a second.

    As best as I can tell, the table is locked and not allowing the delete to occur. It's puzzling, however, that the lock never seems to resolve. Although the queries against this table normally take a few seconds, the update process will sit for hours without finishing. Eventually, it starts to slow down the entire database until users start getting time outs from select statements on unrelated databases. It basically cripples the system until I stop the job, then the system will immediately resume normal operation. I am certain that this job is causing the entire database to slow down -- the correlation is direct.

    So I'm trying to come up with some way around this. Obviously, I can't have a job that randomly slows down the entire database.

    If it helps, the exact query I'm running is this:

    DELETE FROM fact.WorkOrder

    WHERE WOStatus NOT IN ('C', 'X')

    OR CreateDate >= @startdate

    OR CloseDate >= @startdate

    Which deletes all open work orders or work orders that have been touched in the last month. Immediately after this, the previously staged data is quickly pumped into the table.

    What am I missing here? What are the preferred ways to update a actively used table? Any ideas why the lock isn't being released, even after an extended period of time?

  • This is a critical issue that remains unresolved, therefore I must bump.

  • Have you looked at the Actual Execution Plan for the statement? It may also be worth examining the equivalent SELECT statement during a busy period. Run this, substituting your real PK, and post the actual plan;

    SELECT 'YourPK', WOStatus, CreateDate, CloseDate

    FROM fact.WorkOrder

    WHERE WOStatus NOT IN ('C', 'X')

    OR CreateDate >= @startdate

    OR CloseDate >= @startdate

    I don't have a good explanation for your findings; however, checking the performance of the statement would be my first port of call. The quicker it runs, the less time locks will be held.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Well, your suggestion made me quickly realize that I was missing two indexes. The field "CloseDate" was used often in queries, and was already indexed. The other two fields were not indexed.

    Despite adding these two indexes, I am still having problems. I ran the SELECT query during peak hours, and it took 8 minutes. The execution plan was real simple. It spent 0% on the Select, 6% on Parallelism, and 94% on a clustered index scan. The clustered index is an artificial key (an identity field, because the true primary key is actually a 5-field complex key of all VARCHARs) that isn't used in the query at all. So it seems like this clustered index scan is the equivalent of a full table scan. The query selected 27,000 rows out of 55 million. Obviously a full table scan of 55 million rows to find 27,000 would be monstrously inefficient.

    So it appears that I have some structural issues. I don't when the table reached 55 million rows, I don't remember it being that large. I think the table is growing faster than I expected it would, and I'm having scaling issues. I could use a few suggestions.

    I don't think a composite index would help much. I think the greater than comparisons and the OR would get in the way of it using the index efficiently. (It's worth mentioning that the date fields are actually integers, packed in YYYYMMDD format.) Obviously it's decided not to use the individual indexes for some reason.

    I'm not 100% sure where to go from there. I could possibly remove the CreateDate filter (which is principally there to prevent updating really old, open orders which are most likely erroneous). The principal things I'm looking for are open orders and orders closed since a specific date. Removing that part of the WHERE statement, however, had no noticeable impact on the query. It still continues to use the clustered index scan.

    The clustered key is based on the WorkOrderID, which largely puts the rows in order by CreateDate. Since the clustered key is never directly used in queries, perhaps I could make it a non-clustered primary key and use CloseDate as the clustered key? It might speed up some queries, too, since CloseDate is used often in queries. I won't be able to do make this kind of change until after hours, however.

    Thanks for the help and suggestion. Do you have any further ideas or thoughts?

  • Perhaps splitting it up might help

    DELETE FROM fact.WorkOrder

    WHERE WOStatus NOT IN ('C', 'X');

    DELETE FROM fact.WorkOrder

    WHERE CreateDate >= @startdate;

    DELETE FROM fact.WorkOrder

    WHERE CloseDate >= @startdate;

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Wow, talk about not seeing the forest for the trees. All three queries finished instantly, each using a index seek. I've altered the DELETE statements, and I expect it will have a marked impact on performance.

    Thank you so much for the help. So glad I could get another perspective on this.

  • ...Or not.

    The SELECT statements finish in a fraction of a second, but the DELETE statements continue to stall out.

  • 1. How many indexes on fact.WorkOrder?

    2. How many Foreign Keys on fact.WorkOrder?

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • 8 indexes including the primary key. The data space is 17.5G, and the index space is 7G.

    There are no foreign keys.

    Some additional information:

    It stalled out while there was no other activity against that database. The activity monitor shows only sleeping sessions from my open SSMS. Nothing else was running against that particular database.

    The wait type is listed as PAGEIOLATCH_EX and PAGEIOLATCH_SH. The wait begins immediately when I start running it.

    It slows down the entire server as soon as I start running it. Things like the Activity Monitor and Job Activity Monitor take noticeably longer to refresh. Users get timeouts inside our ERP system.

    It appears that the statement "DELETE FROM fact.WorkOrder WHERE WOStatus NOT IN ('C', 'X')" is completing immediately, but the statement "DELETE FROM fact.WorkOrder WHERE CreateDate >= @startdate" took 8 minutes to complete.

  • Can we see the actual execution plan for

    DELETE FROM fact.WorkOrder WHERE CreateDate >= @startdate

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Here is the actual execution plan:

    For comparison, the WOStatus DELETE query, which takes significantly less time, looks like this:

    The major difference appears to be the sorts on the index delete. Presumably the >= operator is causing this.

  • Can you try

    DELETE FROM fact.WorkOrder WHERE CreateDate >= @startdate OPTION (RECOMPILE)

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • That query took 12 seconds. That's a bit longer than I'd like it to take, but within the acceptable range. The execution plan looks like this:

    The inner-most clustered index scan has been changed to an index seek.

    Since this runs on a pretty tight loop, if there is a way to further reduce the delete time then I'd like to explore it.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply