April 28, 2005 at 4:04 am
Hi Looking for suggestions on what to monitor here for a strange annoying problem
Basic setup, Clients authenticate via a Appserver and then talk direct to the database via app front end.
SQL Server 2K Enterprise clustered server. There's a screen the client uses called My Activities which normally responds
within 5 seconds. On Tuesday it started talking beween 1 and 3 minutes. The query behind this screen runs in QA in seconds while
this problem is occouring. There are no long running queries, no blocking, usual amount of locking i.e. seconds at the most.
CPU is ok, memory is ok, disk usage is ok. All other screens and activities against the database are fine except for this one.
after about hers it suddenly disappears with no changes to database made. We're discussing index's, batch jobs, checking application server
etc but not actually made any changes.
This morning the problem is spotted 7:15 am with the users due on for 8am. Same scenario, everything works fine except this one query
via the app, via QA no problem. Managed to failover to the other node before users are in and suddenly everything is ok again.
I'm confused, any ideas any one , what to look for
I'm fairly sure it's not ur normal, long running query, blocked process, Server high resource usage
or shoddy index but I'm damned if I know what it is ??
Thanx in advance
April 28, 2005 at 6:32 am
It's never easy with this type of problem .. however, you might want to look at some of these:-
Disk counters, especially response time and queues
Processes, I've found things such as anti-virus, netbackup, disk defrag ( all sorts of things scheduled by platforms dept which they don't always tell you about )
Check sysprocesses looking for compile locks on stored procs, you didn't actually state if the sql was a proc or a query. Procs that can accept params that vary wildly may create plans that give poor performance.
Auto shrink or auto close on your databases(s)
Auto update stats kicking in ( although I've never found this has given me a problem )
You might want to get a demo of diagnostic manager as this will allow you to track processes in real time which can be very very useful and has helped me identify issues previously.
Processes using select into #temp tables can cause big problems.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
April 28, 2005 at 11:46 am
Yeah, ok All db files are on a SAN and nothing out of the ordinary happening there, same with process etc I would expect the same query run via QA as via the app front end to experience the same problems. All other screens queries etc run ok. Not 100% rule anything out but I would expect overall performance degradation if we had a bottle neck. It's a query running from the client’s front end that talks direct to the DB, doesn't go through the App server at all. I don't use auto shrink and always keep the database files minimum 20% bigger than the actual data requirements.
This particular table that the “from clause” is running against is approx 2.7 million rows taking up 2.7GB of space.
Now interesting you mentioned auto stats, I didn't want to lead anyone down this route without them 1st mentioning it. Apparently according to the non dba's here longer than moi the last time they had a problem with this table was 2 years ago and it appeared to be something to do with system generated index's/stats. I don't have the exact details as the previous DBA's have all run away but it did occur to me that perhaps for some reason stats were not being correctly generated and the optimizer was taking a not so smart route and then at some point realising the error of its ways and changing back. but I'm not sure how and why. Both occasions it's been 1st thing in the morning and although there are not batch jobs run against the db I've since discovered that when the app process's certain items , letters and emails, it updates this table via MQ and a certain proportion of these do occur overnight. Auto statistics is turned on in the DB. If it occurs again tomorrow morning I'm tempted to update stats and flush the buffers to see if it has any effect. (if it does happen and we catch it like today then we'll end up doing a failover before the business gets in anyway so might as well test it out ) Other than delving back into my books and reading up on stats etc at the moment I'm still lost.  But hey it refreshes my memory on the whole index thing stats etc and at least I can say I've investigated this avenue rather than fingers crossed and hope it goes away
 But hey it refreshes my memory on the whole index thing stats etc and at least I can say I've investigated this avenue rather than fingers crossed and hope it goes away 
cheers
April 28, 2005 at 12:06 pm
On the back of that I should mention that there is another query run by the clients that uses two less fields that runs in 3 seconds or less while the offending one takes a minute or more, a quick look shows four of the indexes , and also a _W index/stat which I believe to be system generated ??
,reference these fields shows they’ve been updated on
S_EVT_ACT_X6_X ---- 22/04/2005 04:07:30---22/04/2005 04:07:30
S_EVT_ACT_M6 ----- 28/04/2005 04:04:25 -----13/04/2005 04:05:49
S_EVT_ACT_M4 -------- 16/04/2005 04:05:14 ----16/04/2005 04:05:14
S_EVT_ACT_F4 --------- 28/04/2005 04:03:53 ----15/04/2005 04:05:07
_WA_SYS_EVT_STAT_CD_5150D53D --- 28/04/2005 10:08:07 --- 02/04/2005 12:33:36
First figure being current today and second figure from a restore of last nights backup.
Now when allowing SQL Server to AutoUpdate statistics, the larger the table, the less likely that the statistics are current at any given moment. The AutoUpdate threshold is based on the number of rows modified since the stats were last updated, and it works out to approx 20% of the total rowcount. If I have a table with 2.7 million rows with current stats, the stats won't be updated again until approximately 540,000 more rows are inserted/updated/deleted.
But looking at my stats_dates doesn’t explain why I had problems Tuesday and Thursday but not Wednesday or indeed all day each time.
Maybe I should shift this to a Performance forum ?? if an admin would care to advise
Ta
April 28, 2005 at 2:11 pm
How many processors your server has? It could be query optimizer generating different execution plans due to statistcs changes that results in large number records inserted/updated/deleted.
April 29, 2005 at 6:54 am
8 Processors and set to no parallelism , ie use one processor
April 29, 2005 at 10:03 am
Sounds like an inefficient query plan (probably a table scan) got cached. This usually happens when the query is first run (after the server is restarted) with atypical (e.g. NULL) search conditions. Try forcing a recompile (using typical values) the next time this happens.
--Jonathan
May 1, 2005 at 11:30 am
Just a thought. When you failover SQL server, the services gets re-started. With your system coming back up after a failover, its possible that there might be some open transaction which might be holding resources.
May 1, 2005 at 3:30 pm
Hi, yeah I thought that also , but it happened 3 times and I'm not sure where a rouge process would popup from 3 times in a week when there's no previous occourance. After a bit of research I came to the conclusion that there must be a cached version of the query and that for some reason the overnight updated to the table then made the query ineffecient when the index got fragmented beyond 75%.( 2.7 million rows so autoupdate stats prob not kicking in ) I tested this friday by clearing the cache and getting response time from 2 mins to 10 secs. I then updated the stats on that table and again cleared buffers and response time was 3 secs. Fri Night the scheduled reindex kicked in and Sat morning all worked fine 1st time round.
So looks like we need to raise our automatic reindex thresholds up a few percentage. live and Learn 
thanx for all the advice so far
Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply