SQL 2012 - enterprise... OLTP environment...
Experiencing the following - several times per hour. Has anyone experieced the following:
FlushCache: cleaned up 513008 bufs with 448519 writes in 75641 ms (avoided 92621 new dirty bufs) for db 10:0
average throughput: 52.99 MB/sec, I/O saturation: 104, context switches 5758
last target outstanding: 80800, avgWriteLatency 0
The system is generating the flush cache and causing severe performance degradations.FREEPROCCACHE isn't the answer to this issue.
FlushCache == Checkpoint. This message is logged when a checkpoint takes longer than the recovery interval. It's more of a warning than an actual error.
In your example above, SqlServer wrote 4GB (513008 x 8k) of dirty pages to disk, taking 75.6 seconds in the process. Taking a step backwards to right before the checkpoint started, you had 4GB of dirty data in the buffer pool that hadn't been flushed to the data file(s) (of course it's already in the log file on disk). If a crash occurred right then, when SQL attempts to recover the db, it would have to roll-forward/roll-back the transactions associated with that 4GB of dirty data to bring the db current. Just writing the 4GB of data to disk takes 75 seconds, so the total time to run recovery will very likely be at least 75 seconds. If the recovery interval is 60 seconds, then we're going to take longer to recover the database than we expected.
And that's the problem: You're dirtying data in the buffer pool faster than the checkpoint process can handle within your recovery interval.
Extreme Example: Consider a 2TB database on a server with 512GB of RAM. The logs could be on SSD's, but the data files are on a slow Raid-6 Array. You update 20 GB of data and it takes 20 seconds (fast log writes to the SSD's, yay!). Now there are 20 GB of dirty pages in the buffer pool that need to be written to the slow Raid-6 Array. They are likely to mostly be 8k random writes. If you were able to get 1000 write IOPS out of your Raid 6 array (not likely, but i'm being overly optimistic), then you'd be looking at 40+ minutes to write that data to disk. In other words, if your cluster failed over right after your "20 second" update, it would take over 40 minutes.
If your recovery interval is 60 seconds (default), are you okay with it taking 75 seconds? 120 seconds? 300 seconds? If so, then you can safely ignore the errors.
If you're getting server performance issues at the same time, then keep reading. Checkpoints aren't supposed to cause a negative performance impact to running queries. If they are, then your problem is IO.
Start capturing perfmon counters related to disk IO LogicalDisk(*) and/or PhysicalDisk(*), and any sql counters related to disk io ( reads/sec, checkpoint pages/sec, log flushes/sec, etc.) 1 sec interval is best but no more than 5 sec.
Watch how Current Disk Queue Length, Disk Reads/sec, Disk Writes/sec, Avg Disk Sec/Read, Avg Disk Sec/Write, etc correlate to the sql counters.
What are your baselines?
What happens to your current disk queue length and latency (Avg Disk Sec/Read, Avg Disk sec/write) when a checkpoint occurs?
Using SAN storage? Involve your SAN admin.
To resolve this issue, you have several options:
1. dirty fewer pages (drop extra indexes, use compression, tune queries, etc). Fewer dirty pages means less work each checkpoint.
2. reduce IO load overall (Add memory to reduce reads/sec, move busy tempdb to different drive, tune queries, etc)
3. increase IO write capacity (extra spindles in SAN, add SSD's, switch from Raid-5 to Raid-10, etc)
4. smooth out checkpoint's IO load (set a really high recovery interval and perform manual checkpoints. Don't go here until you've got a really good handle on the perfmon counters above and can prove that this helps.)