Weird performance issue with ERP software table

  • I used to think I am quite good at tackling performance issues, but this one is not making any sense.

    First of all:

    - I have no control over the queries being fired at the database. Just the usual: indexes and DB settings.

    - Also, all ERP connections and SSMS are in READ_UNCOMMITTED connection mode to prevent locks/deadlocks.

    - I have to admit there are connections made as COMMITED, but as far as I can tell they don't have any traffic or queries. I profiled the database and found no evidence.

    I have a 2.3M row table, nothing fancy, where the ERP software has put on 32 multiple column indexes. I have a simple query that selects from one BankTransaction table and searches about 15 columns in a where statement which runs for 20 minutes minimum, and returns the top 1 banktransaction for the given where statement.

    For good measure, the actual query I will include. Remember I did not come up with it... it's the ERP software.

    SELECT TOP 1 ID  FROM BankTransactions 
    WHERE Type = 'W'
    AND Status IN ('P','J')
    AND TransactionType = 'T'
    AND MatchID IS NULL
    AND ProcessingDate is not null
    AND InvoiceDate IS NULL
    AND TermPercentage = 0
    AND SupplierInvoiceNumber IS NOT NULL
    AND DebtorNumber IS NULL
    AND CreditorNumber = ' 26'
    AND EntryNumber IS NULL
    AND InvoiceNumber IS NULL
    AND (OrderNumber IN ('50185891','50185891','50185891') OR OrderNumber IS NULL)
    ORDER BY OrderNumber DESC

    20 minutes is outrageous for 2.3M rows even if there would not be any indexes:

    • Copying the whole table without indexes, and the same query returns in 4 seconds.
    • Copying the whole table including all indexes, and the query returns in 2 seconds.
    • Re-indexing the original table makes no difference. Updating statistics was not the problem either.
    • The with(nolock) directive does not make a difference.
    • Makes sense, because I cannot find any blocked transaction results while the query is running.
    • Creating an index on the original table for the 5 problematic columns: query returns in 5 seconds.
    • However, after leaving the database to the ERP overnight... this new index is rendered useless again? It has 0.11% fragmentation.
    • Rebuilding or drop/create the index doesnt help a bit after that.

    So although I did kind of solve it, I don't understand the underlying problem as a table without indexes is much faster.

    The only difference being that the copied table is not in use by ERP processes with the occasional update, delete or insert. So in that case our simple select is waiting on the IX locked table as single records are locked exclusively? But there are no blocked transactions?

    But then: why does an additional index on 5 columns work? The records will still be locked as deletes and updates happen?

    Could there be a problem where there are so many indexes that the optimizer chooses the wrong indexes? But then again: a straight table scan returns in 4 seconds so why?

    Looking at the execution plan I am not getting any wiser really. Basically it first gets an ordered lists of all records that would get taken into consideration. It estimates that this list is about 15 records long, but it's actually 1.237.529 records long. It then needs to do a (clustered) key lookup which it executes 1.237.529 times. What could be the reason behind this weird execution plan? Because it should be clear to see for the optimizer that there are in fact 1.2M rows which have OrderNumber = null, instead of 15.

    This led me to believe the statistics of the table are wrong. But rebuilding them doesn't help either? Is there any way that the statistics can be wrongly configured? I found that leaving the copied table with indexes on the server overnight, causes the table to exhibit roughly the same issue: 25 minutes of running time and no way to get around it.

    Does anyone have any idea what could be going on, and how I would gather information proving that?

  • Without seeing the execution plan, the indexes and the statistics, I only have guesses.

    Row estimates are off. That usually points to statistics. I know you've said you've updated them, but using sampling or a full scan? If you've tried one, try the other. It was working, but then overnight it changed. Well, what happens overnight? New data, stats updates, index rebuilds? Did you capture the plan while it was working well? Comparing when it's working right to when it's working wrong goes a long way to understanding what's going on. Get the before & after plans. I think that'll tell the tale. Look to the indexes used, their statistics, and the row estimates for when it's running well and when it's running poorly. SSMS has a way to compare plans. Use it.

    Side note, don't sweat fragmentation. That's a marginal, at best, solution for performance. Most of the time when people see improvements in performance because they defragged an index it's because either the statistics were updated or the queries were recompiled because of the statistics updates. When performance is varying like this, when the code is not, it's almost always statistics. Stick to root causes and underlying issues. Don't get sidetracked.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • This was removed by the editor as SPAM

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply