July 15, 2013 at 7:52 am
Hi,
We have a sql 2005 prod server,where one of the DB became very slow past 5 days.
i have observer that blocking happening @ regular intervials from past 3 days.
and also i have checked fragmentation and some the non cluster indexs and heap ,avg_fragmentation_in_percent
is having more than 40% .
Can you please help us hw to troubleshoot this issue.
Many THanks.
July 15, 2013 at 9:04 am
The source of the problem could be on the SQL server side ( bad queries, stale statistics or fragmented indexes, unexpected data growth, etc.)
or on the OS/hardware side (disk/IO issues, power saving features were turned on on your server, etc.)
Do you do regular index and statistics maintenance?
During periods of slowness and blocking is CPU maxed out or do you have disk queue build up or both?
Are you seeing unusual growth in databases (user or tempdb, data or log files)?
July 15, 2013 at 9:36 am
Performance issues can be caused by so many factors from hardware issues, inefficient code, out of date statistics, bad execution plans the list could go on and on.
You need to start with the basics:
1. Using performance monitor create a counter log to capture key hardware metrics, such as CPU utlisation, reads, writes, processor queue length, disk queue lengths etc.
For example:
Memory Counters
i.Memory Available Mbytes, at least 200MB should be free.
ii.Memory Pages/sec below 20 preferably below 15
iii.SQL Server: Buffer Manager: Buffer Cache Hit ratio should exceeed 90% preferably 99%
iv.SQL Server: Buffer Manager: Page Life Expectancy, should exceed 300, the higher the value the better.
Processor Counters
i.Processor: %Processor Time: Should Average Below 75% preferably below 50%
ii.System: Processor Queue Length: Should average below 2 per processor, For example is a 2 processor machine it should remain below 4.
I\O Bottlenecks
i.Physical Disk - % Disk Time should average below 50%
ii.Physical Disk – Avg Disk Queue Length. Should average below 2 per disk. For example for an array of 5 disks this figure should be below 10.
iii.Physical Disk – Avg. Disk Reads/sec. Should be below 85% of the IOPS value of the disk
iv.Physical Disk – Avg. Disk Writes/sec. Should be below 85% of the IOPS value of the disk
2. Create a server side trace to capture the calls made to the database and write them to a file, as I have no way of knowing how you connect to or use your database I would start with the events RPC_completed.
Once you have your trace you can use the code simliar to below to find the top consuming queries by reads, or number of executions
-- variables
declare @trace nvarchar(1000), @filter nvarchar(1000)
select @trace ='E:\Traces\output_20120125_1826\sp_trace.trc'
-- read raw data
select *
into #trace
from ::fn_trace_gettable(@trace, DEFAULT)
where TextData is not null
and CONVERT(varchar(100), textdata) <> 'exec sp_reset_connection'
-- extract according to filter
select CASE WHEN TextData LIKE '%exec %' THEN CONVERT(varchar(1000), TextData) ELSE NULL END AS ProcName, HostName, cpu, reads, starttime, CONVERT(varchar(1000), TextData) AS TextData, Duration
into #procs
from #trace
where textdata like case when @filter is not null then '%' + @filter + '%' else textdata end
-- Deletes
DELETE #procs WHERE textData LIKE '%msdb.dbo.%'
DELETE #procs WHERE TextData = ' if @@trancount > 0 commit tran'
DELETE #procs WHERE TextData = 'begin tran'
-- reformat
update #procs
SET procname = SUBSTRING(procname, CHARINDEX('exec dbo.', ProcName), 1000)
WHERE CHARINDEX('exec dbo.', ProcName) > 0
update #procs
set procname = REPLACE(ProcName, 'exec dbo.', '')
update #procs
set procname = LEFT(procname, charindex('@', ProcName)-1)
where CHARINDEX('@', procname) > 0
-- summary results by proc
select top 10 procname, COUNT(*) as calls, AVG(cpu) AS AvgCPU, AVG(Reads) AS AvgReads, MIN(starttime) AS [Start], MAX(starttime) AS [End], AVG(Duration) AS AvgDuration
from #procs
group by procname
order by (COUNT(*) * AVG(Reads)) DESC
-- results for top 5 sprocs
select *
from #procs
where procname in (
select top 1 procname
from #procs
group by procname
order by (COUNT(*) * AVG(Reads)) DESC
)
order by StartTime
-- cleanup
DROP TABLE #trace
DROP TABLE #procs
MCITP SQL 2005, MCSA SQL 2012
July 16, 2013 at 9:27 am
Thanks for reply,
Blocked queries that are hitting the table ,contains non cluster indexes which is have fragmentation(avg freg colum >50%)....plz clarify whether rebuild is best option for troubleshooting the issue.
advance thanks.
July 16, 2013 at 9:52 am
Best option, no. If management is screaming that 'something must be done', then you can rebuild but it probably won't have a noticable effect (unless the problem was due to stale statistics)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 16, 2013 at 10:02 am
RTaylor2208 (7/15/2013)
iii.SQL Server: Buffer Manager: Buffer Cache Hit ratio should exceeed 90% preferably 99%
Buffer cache hit ratio is a near-useless counter. By the time it shows a noticable drop, the server would have been under cripple memory pressure for some time
iv.SQL Server: Buffer Manager: Page Life Expectancy, should exceed 300, the higher the value the better.
Oh not that 300 nonsense again. 300 sec for an average page to stay in cache was borderline too low when the figure was released 8 odd years ago, at a time when 4GB of memory was a lot for a server.
These days when 64GB of memory is small and servers with over 1TB of memory are known, suggesting that it's acceptable to have the entire buffer pool displaced and the entire size of the buffer pool read off disk every 5 minutes is just wrong, there are very, very few IO subsystems that could handle 64GB being read every 5 minutes.
Buffer cache hit ratio should be within normal range of the server in question, as identified by benchmarking. If a threshold is wanted, then a bare minimum could be (300/4)*(max server memory in GB)
i.Processor: %Processor Time: Should Average Below 75% preferably below 50%
Why would I want half my processors to be doing nothing? If I have a server where the average CPU is under 50, that's a candidate for consolidation so that I can get higher hardware utilisation.
ii.Physical Disk – Avg Disk Queue Length. Should average below 2 per disk. For example for an array of 5 disks this figure should be below 10.
Not true for a number of reasons. Firstly, unless you have direct attached storage, there's too much between the machine and the disks to come up with any reasonable threshold. The second reason is that SQL is designed to intentionally queue up lots of IOs, it can and will intentionally, as part of normal operation, drive the disk queue length very high.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply