Performance issue on DB

Question

Performance issue on DB

bala2

SSCertifiable

Points: 5547
More actions
July 15, 2013 at 7:52 am

#275270

Hi,
We have a sql 2005 prod server,where one of the DB became very slow past 5 days.
i have observer that blocking happening @ regular intervials from past 3 days.
and also i have checked fragmentation and some the non cluster indexs and heap ,avg_fragmentation_in_percent
is having more than 40% .
Can you please help us hw to troubleshoot this issue.
Many THanks.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

arnipetursson SSCertifiable Points: 6557 More actions · Answer 1

The source of the problem could be on the SQL server side ( bad queries, stale statistics or fragmented indexes, unexpected data growth, etc.)

or on the OS/hardware side (disk/IO issues, power saving features were turned on on your server, etc.)

Do you do regular index and statistics maintenance?

During periods of slowness and blocking is CPU maxed out or do you have disk queue build up or both?

Are you seeing unusual growth in databases (user or tempdb, data or log files)?

RTaylor2208 SSChampion Points: 13298 More actions · Answer 2

Performance issues can be caused by so many factors from hardware issues, inefficient code, out of date statistics, bad execution plans the list could go on and on.

You need to start with the basics:

1. Using performance monitor create a counter log to capture key hardware metrics, such as CPU utlisation, reads, writes, processor queue length, disk queue lengths etc.

For example:

Memory Counters

i.Memory Available Mbytes, at least 200MB should be free.

ii.Memory Pages/sec below 20 preferably below 15

iii.SQL Server: Buffer Manager: Buffer Cache Hit ratio should exceeed 90% preferably 99%

iv.SQL Server: Buffer Manager: Page Life Expectancy, should exceed 300, the higher the value the better.

Processor Counters

i.Processor: %Processor Time: Should Average Below 75% preferably below 50%

ii.System: Processor Queue Length: Should average below 2 per processor, For example is a 2 processor machine it should remain below 4.

I\O Bottlenecks

i.Physical Disk - % Disk Time should average below 50%

ii.Physical Disk – Avg Disk Queue Length. Should average below 2 per disk. For example for an array of 5 disks this figure should be below 10.

iii.Physical Disk – Avg. Disk Reads/sec. Should be below 85% of the IOPS value of the disk

iv.Physical Disk – Avg. Disk Writes/sec. Should be below 85% of the IOPS value of the disk

2. Create a server side trace to capture the calls made to the database and write them to a file, as I have no way of knowing how you connect to or use your database I would start with the events RPC_completed.

Once you have your trace you can use the code simliar to below to find the top consuming queries by reads, or number of executions

-- variables

declare @trace nvarchar(1000), @filter nvarchar(1000)

select @trace ='E:\Traces\output_20120125_1826\sp_trace.trc'

-- read raw data

select *

into #trace

from ::fn_trace_gettable(@trace, DEFAULT)

where TextData is not null

and CONVERT(varchar(100), textdata) <> 'exec sp_reset_connection'

-- extract according to filter

select CASE WHEN TextData LIKE '%exec %' THEN CONVERT(varchar(1000), TextData) ELSE NULL END AS ProcName, HostName, cpu, reads, starttime, CONVERT(varchar(1000), TextData) AS TextData, Duration

into #procs

from #trace

where textdata like case when @filter is not null then '%' + @filter + '%' else textdata end

-- Deletes

DELETE #procs WHERE textData LIKE '%msdb.dbo.%'

DELETE #procs WHERE TextData = ' if @@trancount > 0 commit tran'

DELETE #procs WHERE TextData = 'begin tran'

-- reformat

update #procs

SET procname = SUBSTRING(procname, CHARINDEX('exec dbo.', ProcName), 1000)

WHERE CHARINDEX('exec dbo.', ProcName) > 0

update #procs

set procname = REPLACE(ProcName, 'exec dbo.', '')

update #procs

set procname = LEFT(procname, charindex('@', ProcName)-1)

where CHARINDEX('@', procname) > 0

-- summary results by proc

select top 10 procname, COUNT(*) as calls, AVG(cpu) AS AvgCPU, AVG(Reads) AS AvgReads, MIN(starttime) AS [Start], MAX(starttime) AS [End], AVG(Duration) AS AvgDuration

from #procs

group by procname

order by (COUNT(*) * AVG(Reads)) DESC

-- results for top 5 sprocs

select *

from #procs

where procname in (

select top 1 procname

from #procs

group by procname

order by (COUNT(*) * AVG(Reads)) DESC

)

order by StartTime

-- cleanup

DROP TABLE #trace

DROP TABLE #procs

MCITP SQL 2005, MCSA SQL 2012

bala2 SSCertifiable Points: 5547 More actions · Answer 3

Thanks for reply,

Blocked queries that are hitting the table ,contains non cluster indexes which is have fragmentation(avg freg colum >50%)....plz clarify whether rebuild is best option for troubleshooting the issue.

advance thanks.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

Best option, no. If management is screaming that 'something must be done', then you can rebuild but it probably won't have a noticable effect (unless the problem was due to stale statistics)

http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/

http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-2/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

RTaylor2208 (7/15/2013)
iii.SQL Server: Buffer Manager: Buffer Cache Hit ratio should exceeed 90% preferably 99%

Buffer cache hit ratio is a near-useless counter. By the time it shows a noticable drop, the server would have been under cripple memory pressure for some time

iv.SQL Server: Buffer Manager: Page Life Expectancy, should exceed 300, the higher the value the better.

Oh not that 300 nonsense again. 300 sec for an average page to stay in cache was borderline too low when the figure was released 8 odd years ago, at a time when 4GB of memory was a lot for a server.

These days when 64GB of memory is small and servers with over 1TB of memory are known, suggesting that it's acceptable to have the entire buffer pool displaced and the entire size of the buffer pool read off disk every 5 minutes is just wrong, there are very, very few IO subsystems that could handle 64GB being read every 5 minutes.

Buffer cache hit ratio should be within normal range of the server in question, as identified by benchmarking. If a threshold is wanted, then a bare minimum could be (300/4)*(max server memory in GB)

i.Processor: %Processor Time: Should Average Below 75% preferably below 50%

Why would I want half my processors to be doing nothing? If I have a server where the average CPU is under 50, that's a candidate for consolidation so that I can get higher hardware utilisation.

ii.Physical Disk – Avg Disk Queue Length. Should average below 2 per disk. For example for an array of 5 disks this figure should be below 10.

Not true for a number of reasons. Firstly, unless you have direct attached storage, there's too much between the machine and the disks to come up with any reasonable threshold. The second reason is that SQL is designed to intentionally queue up lots of IOs, it can and will intentionally, as part of normal operation, drive the disk queue length very high.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass