No blocking, queries running slowly, what to check next?

  • Background: Database developer trying to fill in for an actual dba on a contract.

    Environment: Web app with very poor data access structure hitting SQL Server 2005. Web app can't be altered in any real manner because it is a vendor-supplied app with only compiled code on the client side. The business is commercial sales so there is heavy traffic working through navigation by categories and (unfortunately) heavy traffic calculating prices with every page view.

    Event: Staff reports 'web site is slow'.

    I'm running sp_WhoIsActive and sp_who2 and not seeing any wait times other than an occasional IO_Network wait. For this site there is high volume, 30+ queries hitting the db at the same time.

    Current 'solution' is to restart the SQL Service. I'm not sure what that is solving other than making a bunch of customers annoyed so they drop off the site.

    Suggestions for other things to check???

    Thanks!

  • I'd start with slow running queries;

    if you cannot change the app code, you might at least add indexes that will help.

    here's a query for the top 20 slowest queries

    SELECT TOP 20 object_schema_name(qt.object_id, qt.database_id) + N'.' + object_name(qt.object_id, qt.database_id) AS 'SPName', qt.query_text , qs.total_worker_time AS 'TotalWorkerTime',

    qs.total_worker_time/qs.execution_count AS 'AvgWorkerTime',

    qs.execution_count AS 'ExecutionCount',

    ISNULL(qs.execution_count/DATEDIFF(Second, qs.creation_time, GetDate()), 0) AS 'CallsPerSecond',

    ISNULL(qs.total_elapsed_time/qs.execution_count, 0) AS 'AvgElapsedTime',

    qs.max_logical_reads, qs.max_logical_writes,

    DATEDIFF(Minute, qs.creation_time, GetDate()) AS 'AgeInCache'

    FROM sys.dm_exec_query_stats AS qs

    cross apply msdb.MS_PerfDashboard.fn_QueryTextFromHandle(sql_handle, statement_start_offset, statement_end_offset) as qt

    WHERE qt.database_id = db_id()

    ORDER BY qs.total_worker_time DESC

    here's queries taking longer than 15 seconds:

    select

    fn.*,

    st.*

    from sys.dm_exec_query_stats st

    cross apply sys.dm_exec_sql_text(st.[sql_handle]) fn

    where st.max_elapsed_time >= (1000 * 15) --15 seconds

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Restarting the SQL service is probably speeding things up by getting rid of poorly built execution plans from the cache. That might speed it up, for a while, but it's a really poor solution (as you already mentioned).

    I suggest that your best bet for the server is to download a free copy of Confio Ignite, set it up to monitor that server, and take a look at what it tells you are the worst queries on it. It will summarize the data in a very easy-to-read graph, and make it very clear what's slowing things down.

    That's assuming you'll be able to actually modify anything in terms of database objects, queries, and so on. Can you, or is all of that embedded in the code that you can't modify?

    You can get Ignite free from here: http://www.ignitefree.com/

    Disclosure: I'm not an employee, blah blah blah, of Confio, and have no vested interest in you getting it. I'm just a DBA who finds it really, really useful for seeing what's slowing down a server. Especially a server that I'm not familiar with yet. Quest Foglight/Spotlight/Performance Monitor will do more, give you more data, and cost more, than Ignite. Go with that if you want to dig into the data to that extent. I suggest Ignite only because it's free, and easier for a non-DBA to use.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • hi ,

    1. Check for missing indexes

    2. Do you running rebuild index\reorg ,update statistics on weekend?

    3.Try to find out which queries taking more cpu time from which application ?

    4. Check for query optimization

    5. Check the mdf files and ldf files in the same location

    6. Run perfmon and check memory bottleneck and cpu bottleneck

    7. Run a trace and find which queries are running during the site slowness

    8 .Ask application team to check all app servers in load balancer or not ?

    9.check for I\O operations [disk read and write per sec]

    Thanks

    Lavanya

  • Lavanyasri (8/7/2012)


    5. Check the mdf files and ldf files in the same location

    Bad idea, MDF and LDF files should be on physically seperate drives.

  • Thanks. You are correct ..I know this antony.

    Just i am asking him to check both the files in the same drive or different drive??

    The mdf and ldf should be in different drives. Temp db should be in different drive .

  • Lavanyasri (8/7/2012)


    Thanks. You are correct ..I know this antony.

    Just i am asking him to check both the files in the same drive or different drive??

    The mdf and ldf should be in different drives. Temp db should be in different drive .

    Yep, thats right. Just the way it was worded sounded like you wanted the OP to have the MDF and LDF on the same drive.

  • OK, well where to start with the replies... Maybe this check list...

    Lavanyasri (8/7/2012)


    hi ,

    1. Check for missing indexes

    2. Do you running rebuild index\reorg ,update statistics on weekend?

    3.Try to find out which queries taking more cpu time from which application ?

    4. Check for query optimization

    5. Check the mdf files and ldf files in the same location

    6. Run perfmon and check memory bottleneck and cpu bottleneck

    7. Run a trace and find which queries are running during the site slowness

    8 .Ask application team to check all app servers in load balancer or not ?

    9.check for I\O operations [disk read and write per sec]

    1. Done, and done. 2 cycles of adding indexes

    2. Yes, weekly.

    3. There are several apps that are doing ugly things. Some I can kill to relieve the problem momentarily and others that are less easy to whack. When possible I take down some of them.

    4. The worst of the queries are not directly fixable with just code changes. Partly due to the way the app is structured and partly due to lack of process--e.g. de-normalized, pre-aggregated data rather than walking the full business logic for every query. What they are currently doing is a rewrite of the app, but that will take many months.

    5. Week 1 was: a) back up system db's (never done prior to that) b) move logs to their own drive c) move temp db to its own drive and split it to multiple files, and d) set up weekly maintenance on indexes.

    6, 7 & 9. A private consulting company had just done a review of db performance. From what they left from their traces & analysis the main culprit seems to point to the application blocking itself.

    8. Interesting question. I have not asked that.

  • Interesting. One of my caveats here is that the sql server is at a co-lo. Not sure if I'd be allowed to install Ignite on that server but it could be worth a shot.

  • GSquared (8/6/2012)


    Restarting the SQL service is probably speeding things up by getting rid of poorly built execution plans from the cache. That might speed it up, for a while, but it's a really poor solution (as you already mentioned).

    Absolutely right. Instead of restarting you could try dbcc freeproccache (also ugly). If that "solves" the issue you could try updating statistics (outdated statistics could cause bad plans).

    When that doesn't help it may be caused by "parameter sniffing" (google on that to find more info). If you can't change the (vendor) software there is no real solution for this problem.

  • 6, 7 & 9. A private consulting company had just done a review of db performance. From what they left from their traces & analysis the main culprit seems to point to the application blocking itself.

    That very bad way of programming, contact the vendor to fix it immediately. if it not resolved permanently it will occur again even if you improve the hardwares...

    Regards
    Durai Nagarajan

  • I'd my Development team having issues with performance after restoring a db from production with one particular procedure .

    They were all thinking it might be index issues .After analysing, capturing execution plan and stuff I looked at the fragmentation level and did Rebuild/update stats.

    Issue was resolved!

    Hope the same thing will work for you.

  • Next time it's hung I will try dbcc freeproccache just to see if that resolves the problem. Seems like an easy way to see if that is the issue.

    The relationship with the vendor for the web app has been severed so no quick fixes will be happening on the app. The team that is working on the replacement for the web app is using all Linq To SQL and insisting on all direct queries on the tables. I was just reading yesterday that this means that all the calls to the server will run as dynamic sql so the bad cache (query plan) may become a theme in the future, even if it isn't one now!

  • Oh, and apparently I have no manners this morning. Thanks all for the suggestions!

  • Fly Girl (8/9/2012)


    Next time it's hung I will try dbcc freeproccache just to see if that resolves the problem. Seems like an easy way to see if that is the issue.

    The relationship with the vendor for the web app has been severed so no quick fixes will be happening on the app. The team that is working on the replacement for the web app is using all Linq To SQL and insisting on all direct queries on the tables. I was just reading yesterday that this means that all the calls to the server will run as dynamic sql so the bad cache (query plan) may become a theme in the future, even if it isn't one now!

    Done right, Linq can be just fine. Done wrong, it'll be a performance (and debugging) nightmare like you wouldn't believe.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply