Disk write spikes at 15 min intervals

Question

Disk write spikes at 15 min intervals

mike.dean 43453

SSC Enthusiast

Points: 175
More actions
April 25, 2019 at 3:23 pm

#3535966

I have a VM in Azure operating as a d/b server (4 databases, largest 5 GB), I'm looking to move the d/bs to SQL Azure and ran the SQLPerfMon tool to get an idea of DTUs.
First run showed VERY peaky load, lack of memory seemed a likely problem so I switched up to a bigger server and re-ran the test. That showed very even performance, with just the expected business peaks.
I gave it a few days to make sure SQL had got as much memory as it was going to use and ran a third time. I was expecting fairly similar results again so I was startled to find the peaks were back, but now clearly happening every 15 mins. Digging slightly deeper the most obvious feature is disk writes; every 15 mins they spike up from single digits to around 900-1000 writes/sec, usually just for a single one-second interval. There's a distinct increase in processor (up from minimal to around 25%) for a few seconds around the disk writes. It happens just as much in the 12-13 hours when there's absolutely nothing running as during the day and during overnight jobs.
Logs are backed up every 5 mins, Query Store isn't enabled on any database and target recovery time is defaulted to zero on all. I don't know of anything else that might be relevant. There's nothing running on a 15-min cycle on the applications side.
Can anyone suggest what might be causing this or how I could pin down what the cause is? (I'm an apps person not a dba btw - there are only 5 people in the company).
Alternatively, can anyone advise the consequences for SQL Azure? I was hoping to move to a Standard 100 eDTU pool, at the moment the tool is telling me I should be thinking more like 200 except that the peaks are taking it over 700 which is way beyond our budget. Can I ignore such short peaks and get away with 200???

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 1

Easier question first, sure you can ignore the peaks... with the understanding that you'll see slow behavior, blocking, locking, and resource contention when you're trying to access memory & cpu beyond what the tier supports. It won't stop the service. It'll just slow way down for the people who are waiting on others to clear.

The other problem... That's harder. Assuming it's nothing that you're doing through some scheduled process (I'd always argue, this is highly likely), then it's possible that it's Microsoft doing it. I'd suggest capturing the wait statistics for the period in question. See what is causing things to slow down. That may narrow things down quickly. You know it's the disk, but are there more details? Wait stats would likely point it out. Are these just internal measures from the VM, or are you seeing them externally from the counters supplied by Azure? If internal only, look to those external counters too. If external... woof, assuming you really don't have something scheduled, talk to Microsoft maybe.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

mike.dean 43453 SSC Enthusiast Points: 175 More actions · Answer 2

I'm not sure I understand very much of that, as I say I'm not a dba. I don't think I have wait statistics on SQL 2014? I've had a look at extended events but haven't found anything that's showing anything happening yet.

Are there any specific Azure counters you'd suggest? There isn't any sort of external monitoring configured on the Azure setup at the moment.

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 3

We had similar peaks at regular intervals, can't remember the exact interval, it might have been about 15 mins.

I think it might be to do with having a clustered pair or log shipping. But I think you can just ignore it.

mike.dean 43453 SSC Enthusiast Points: 175 More actions · Answer 4

We're not log shipping; it used to run mirrored on a pair of small servers but we stopped doing that (caused a lot more problems than it ever solved) shortly before moving it to an Azure VM.

The fact that the spikes were completely absent before SQL grabbed all the memory it wanted (56% of 32GB) surely must mean it's "something to do with caching"???

mike.dean 43453 SSC Enthusiast Points: 175 More actions · Answer 5

After spending a fun morning with extended events and Process Monitor I've not been able to identify any SQL-related activity at all around these spikes so I'm going to ignore them.

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 6

Ignoring them might be valid. To check the wait statistics, run a query against sys.dm_os_wait_stats. That's in all versions of SQL Server from 2005 & up (although, Azure SQL Database also has sys.dm_db_wait_stats for database wait statistics).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning