June 11, 2021 at 4:06 pm
We have a Database that we Backup and Restore Nightly to a secondary SQL-Server to Run Jobs to Prepare Reporting data.
When we first restore the Database, the Stored Procedures that runs the first time to Insert Data into our Reporting databases take 2 hours to complete, if we rerun those same Jobs a minute after they complete they will take the correct time of about 6 to 10 Minutes.
We have the Stored Procedures in a database that is not Restored, it grabs information from the Restored Database.
We have Redgate running against the Server, and it shows 2-3 Sec Page Waits
PAGEIOLATCH_SH
Waiting on a latch to read data from disk into memory
2,663,391
To add to the Confusion, we only started having this issue when we moved to a Hyper-V hosted SQL Server. We run the exact same code each night on a Physical Server, and it does not experience the same delays.
June 11, 2021 at 7:33 pm
First thing I'd be curious about is with the wait stats for PAGEIOLATCH_SH being over 2 million, is that for that one job OR since the instance started?
Quick google on that wait type though brought me to stack overflow (https://stackoverflow.com/questions/620626/what-is-pageiolatch-sh-wait-type-in-sql-server#:~:text=PAGEIOLATCH_SH.%20Occurs%20when%20a%20task%20is%20waiting%20on,waits%20may%20indicate%20problems%20with%20the%20disk%20subsystem.) where the relevant response was:
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Shared mode. Long waits may indicate problems with the disk subsystem.
So if PAGEIOLATCH_SH is what is causing the slowness, the problem is likely related to the disk subsystem OR something in the disk subsystem. I say "something in the disk subsystem" as if you have a SAN, it could be the network, it could be the antivirus, it could be the VM configuration.
Now, that all assumes that PAGEIOLATCH_SH is the bottleneck. As to why it is slow on first run and faster on further runs - my guess (as I can't see your system) is that it is moving from on-disk to in-memory (cache), so first run is pulling all data from disk, second and further runs are pulling data from memory. Memory is a LOT faster than disk.
Now, as for the delays, if it is pulling from a SAN disk or a NAS, it could be the initial data pull is slow because the underlying network is slow and future reads are faster because it can use the cache to get the data.
What I would be checking is how are the server resources being used on the first run and on future runs. Are you seeing the network on the server max out or the disk I/O spike?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
June 12, 2021 at 6:15 am
Restoring the database flushes the cache so the first time you run the proc your having to pull all the data back from disk. If your talking multiple 100s of GBs here your going to have to look at better faster disk or alternative methods of getting the data from A to B.
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply