Performance Troubleshooting / Sleuthing Help

  • Greetings,

    I am a SQL database architect / developer on a rather large web-based application with a SQL Server 2005 database as the backend. The application is OLTP with Reporting included in some areas. I have very limited permissions on the SQL Server environments, with the exception of the development environment. We are having performance trouble in the production environment that began being reported after a 'release', which for us means adding and making changes to db objects - stored procedures, tables, views, etc. What is being reported is timeouts in the web interface when performing operations such as saving changes to data and creating records. As part of my troubleshooting, I have been checking ETL processes and find that several of them have been running longer, sometimes a LOT longer, beginning over a month before the release. Our DBA has been somewhat helpful, but has been unable to provide us a focused, honest troubleshooting effort, IMO (I understand he has bandwidth issues of his own). So I am asking for help in understanding what I am seeing and how to go about intelligently requesting consideration by the DBA.

    What I can see:

    I can't run any of the DVMs, but from Activity Monitor, I can see there is locking and blocking. The Wait Types reported are almost all Latch_EX and CXPacket. Usually there is one process that blocks another, which then blocks several others. In one example from an ETL process, the 'main' or 'base' blocking ProcessID is showing 9 times with the 1st showing an hour glass, then the next 8 showing a green circle with a check mark. On the row with the hour glass, the wait type is listed as CXPacket (none of the other rows show a Wait Type) and I can dbl-click the row to get the SQL code. In this case, it is a simple update - changing 'N' to 'Y' for 122K+ records.

    Code:

    UPDATEWorkstation

    SETWorkstation.TrackingInfo = 'Y'

    FROMOrders WITH (NOLOCK)

    INNER JOIN Workstation WITH (NOLOCK) ON Orders.WorkstationOID = Workstation.WorkstationOID

    INNER JOIN Carrier_Tracking WITH (NOLOCK) ON Orders.POnumber = Carrier_Tracking.POnumber

    WHEREWorkstation.TrackingInfo = 'N'

    A month and a half ago, this task ran in 17 seconds, now over 2 hours. It shows it is not being blocked, but that it is blocking another.

    My questions:

    Does the 9 rows with the same ProcessID mean this statement is being run in parallel?

    Is the CXPacket wait on the first row mean parallelism is hanging the statement?

    Would running this using the MAXDOP hint provide a troubleshooting benefit?

    Is there more information you need to provide more light to the situation?

  • Yes, it sounds like it is being executed with parallel threads, which isn't necessarily a bad thing. If you have a similar set of the data for testing, you could try eliminate parallelism as an issue. Assuming that there are appropriate indexes in place, do you know if statistics are getting updated appropriately? When we see performance degradation it often times is due to statistics being out of date or having a samll sample size or stored procedures getting a bad execution plan because of that or other reasons.

  • Few things...

    Yes, the CX packet and the multiple rows show that's running in parallel.

    No, MaxDop is not the immediate right answer, parallelism can be a good thing.

    CX packet means that some portions of the query are waiting for others. It in itself is not the problem. Check what other wait types are there, those are the real problem.

    Can you post indexes, table definitions and execution plan. Could be that we can tune the query a bit.

    Also - http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the replies. The DBA has already helped us with the basic tuning stuff like indexes and statistics and I do see that 'NOLOCK' should not be used in the 'select' of the update statement, as a rule of thumb. But since this particular statement ran in 17 seconds and then ran for 2 hours before there were any changes to the data objects, what I was hoping for was more about what kinds of server settings, like parallelism for CXPACKET waits or other similar, might be the cause of the original slow down. The script sort of acts like it is being blocked, because of the delayed time, but Activity Monitor shows it is doing the blocking instead.

    Ok. So changing gears, sort of.

    The obvious things that could cause this drastic change in behavior when code change is not part of the picture:

    1. Index removed

    2. Server configuration change (which ones?)

    a. Memory misconfiguration

    b. Processor misconfiguration

    c. Cache misconfiguration

    3. Server hardware problems

    a. IO problems

    b. Bad CPU or Memory

    4. Drastic increase in database usage, for lack of a better term.

    Do any of these make sense?

  • 2, 3 probably not. 4 you'd see on activity monitor

    Statistics updated recently?

    Could also just be a data tipping point. Could be a bad plan.

    Can you post indexes and execution plan?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Could also be the UPDATE statement.

    If you have a development system, try this:

    SELECT 1;

    WHILE @@ROWCOUNT > 0

    BEGIN

    UPDATE TOP(5000) w SET

    TrackingInfo = 'Y'

    FROM

    Orders o

    INNER JOIN Workstation w

    ON o.WorkstationOID = w.WorkstationOID

    INNER JOIN Carrier_Tracking ct

    ON o.POnumber = ct.POnumber

    WHERE

    w.TrackingInfo = 'N';

    END

    It will batch the updates at 5000 records. Also, I have seen issues at times with the UPDATE statement when there are multiple joins and one of the tables in the join is being updated.

  • If you're going to try to tune the query, you really need to get a look at the execution plan. That's a first step.

    And to reinforce, NOLOCK is absolutely not a "run faster" switch. It's dangerous. You can get extra rows or you can miss rows. Since you're using it as part of a data load process, it might be rather disconcerting to find your updates aren't getting all the rows that you expect it to.

    On a guess, it's out of date statistics or statistics from a sample. Get the stats updated, possibly using WITH FULL SCAN. But that is just a guess.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply