FlushCache messages ... the ongoing quest for information

Question

FlushCache messages ... the ongoing quest for information

Johan Bijnens

SSC Guru

Points: 135255
More actions
June 25, 2013 at 1:26 am

#276644

With some of our SQL2012 instances we get frequent checkpoint messages during data load
2013-06-25 06:14:04.84 spid16s FlushCache: cleaned up 184496 bufs with 11466 writes in 507527 ms (avoided 125257 new dirty bufs) for db 7:0
2013-06-25 06:14:04.84 spid16s average throughput: 2.84 MB/sec, I/O saturation: 21329, context switches 48504
2013-06-25 06:14:04.84 spid16s last target outstanding: 2400, avgWriteLatency 990
as per CSS blog post http://blogs.msdn.com/b/psssql/archive/2012/06/01/how-it-works-when-is-the-flushcache-message-added-to-sql-server-error-log.aspx
these are supposed to be informational and might point you to investigate further.
...
the message is indicating that the checkpoint process, for the indicated database, exceeded the configured recovery interval.
If this is the case you should review your I/O capabilities as well as the checkpoint and recovery interval targets.
At SSC forum system FlushCache happening often SpringTownDBA added great information with regards to how to investigate the issue and some tips on how to avoid dirty pages.
As I have no windows admin authority for the concerned instances, I cannot just launch perfmon to investigate the thing.
Does someone have a ref that explains the details of it all ?
What do these measures mean?
- I/O saturation mean
- last target outstanding
Why is there so much variation in the figures of the different FulshCache messages?
The only thing going on at that load time is a single database being loaded. No other activity is ongoing at that time span.
I think the server raising the posted message is not configured in an optimal way.
Mainly based on its value for average throughput is low (even 0.42 MB/sec in some messages)
and the value for avgWriteLatency is high (varying from 65 to 990) for a dedicated RAID1 disk ( DASD 15K )
I think adding RAM to this instance might cause the frequency of these message to lower down, but if the issue is the I/O subsystem, the yield for such operation would be low.
Could optimising the NTFS block size to 64K help out ? ( is currently 4K )
Johan

Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

I cannot give you the details you seek, but can provide the following input/responses:

1) The server's IO is clearly not up to the task. RAID1 indicates just TWO spindles mirrored. The IO stalls and throughput you are seeing are just horrible for any production SQL Server work.

2) Adding RAM will NOT help with the disk WRITES (i.e. flushing tlog or dirty data pages to disk). You simply gotta have spindles (or SSDs) for that to happen at an acceptable rate (and an un-bottlenecked IO path). Period.

3) It is possible the load process underway is suboptimal and could be tuned. Could also be based off of the same disk(s) and possibly on the same system - all of which could be robbing the system of what little capacity it has from a CPU/RAM/IO perspective.

4) 64K NTFS cluster format size is preferred, but I cannot see it adding in nearly enough IO capacity to be more than a few percentage points improvement here. Now, if your volume(s) are not SECTOR ALIGNED, then I would definitely consider the strip-to-bare-metal required to get things put right. But I am virtually certain you will STILL not get what you want/need without better IOPs from your disk subsystem.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Johan Bijnens SSC Guru Points: 135255 More actions · Answer 2

Thank you for your reply.

Having my concerns confirmed certainly counts !

Apparently our hosting company works at "least effort" basis and doesn't heasitate to charge big piles of $$ just to make an offer.

Sorry for the rant 🙁

Altough this is a small server, it is our first new case with this company and procedures need to be checked.

Certainly working with hickups at this time.

The instance is hosted on a Win2008R2 box, so I didn't even take sector alignment into account assesing the issues.

It is supposed to be dedicated for this SQL instance.

The loading application is ran on another host.

.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

On Win2K8+, you have to create a volume that is >=4GB for it to be sector aligned on 1024K. What most don't realize is that most servers ship with a "vendor" partition already put at the start of the boot disk that is NOT THAT BIG! I am not certain but part of me wonders if that doesn't mean that all other volumes created on that disk aren't NON-aligned because of that vendor partition.

I don't think this would apply to non-boot-disks though...

I am wondering if this is a vendor who rhymes with backspace. :hehe:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Johan Bijnens SSC Guru Points: 135255 More actions · Answer 4

6 spindles:

4x136GB -> 2 RAID1 volumes -> C-drive and D-drive

2x300GB -> 1 RAID1 volume -> E-drive

I hope they upgraded all hardware drivers before they made it available because that was one of the first things we always needed to perform after having received a new HP server.

I would have expected better performance.

Regular query performance currently seems to meet application needs.

It's just these flushcache messages that keep me puzzeled with regards to the future, when this server will get hammered by queries from all over the globe.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

ALZDBA (6/25/2013)
6 spindles:
4x136GB -> 2 RAID1 volumes -> C-drive and D-drive
2x300GB -> 1 RAID1 volume -> E-drive
I hope they upgraded all hardware drivers before they made it available because that was one of the first things we always needed to perform after having received a new HP server.
I would have expected better performance.
Regular query performance currently seems to meet application needs.
It's just these flushcache messages that keep me puzzeled with regards to the future, when this server will get hammered by queries from all over the globe.

You expect a pair of 1-rotating-spindle volumes to serve up "... hammered by queries from all over the globe"?? Sorry, but that is pure delusion. :blink:

I have seen this many times - too few spindles carved up into RAID1s. You are CREATING bottlenecks this way. Once ANY ONE of those RAID1 sets gets saturated - you can literally kill the entire database application. 2-disk RAID1/4-disk RAID10 or even 6-disk RAID10 would have been the way I went with that, to get at least SOME measure of spindle aggregation. I have had net wins at every client I have done recommended that. I acknowledge there is the possibility that you START slowing down sooner with this type of arrangement (and I would like to see something like 12 or 14+ disks to work with here), but you keep away from the "exponential breakover" point where disk performance just falls through the floor and the system becomes essentially unusable.

Best of luck with it - you are going to need it I think! 😎

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Johan Bijnens SSC Guru Points: 135255 More actions · Answer 6

Exactly what I've been preaching over here 🙂

Configuration nor budget is my call in this case :Whistling:

But this being the first case since this new hosting company is in charge, I better prepare the correct arguments and hard evidence to get them to deliver decent support.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

If nothing else make sure you collect Differential FileIO Stalls and Differential WaitStats during periods of slow (and regular?) performance so you can say definitively "look, you morons, it is taking 974ms on average for every transaction log write, and PAGEIOLATCH_xx waits are through the roof" or whatever is appropriate at the time to show the server is not up to the task.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service