Last time, in Part I of my page-split mini-series, we began speaking about page splits, defining what they are, how they occur, and which counter we can use to track them.
This time we will talk a little about how they show up as a resource wait, and how we can identify one of the common symptoms of excessive page-splitting.
Although we recognize a higher number of page splits per second on the system may indicate an IO bottleneck, it doesn’t give us a good indication of where the splits are coming from.
The reason why too many page splits can decrease the performance of the SQL server is due to the large number of I/O operations. One solution is to deploy a faster I/O subsystem, where page splits would be less of an issue. Of course, throwing hardware at the problem can be an expensive and temporary band-aid.
In addition, pages highly in demand during page-splitting, can cause in-memory contention, and show up as PageLatches.
During data modifications where access to the specific data page is requested, SQL Server will take a short latch on the page to add the data. However, once there is not enough space on a data page, as a new page is allocated and data is copied from the old page to the new page, and then new data is added. Taking into account whether there is a clustered index on the table, SQL Server must take and hold the latch for a longer period of time. This is where the end-user begins to see a performance slowdown. All the user is hoping to do is add new data, not realizing or understanding the internals of the data engine.
Therefore, one way to tell if excessive page splits exist on the system is to look at wait stats. High wait times for latch-wait statistics is one possible indication of too many page splits. The latches are held in EX (exclusive) mode, therefore will show up in the waiting queue as PageLatch_EX.
Contrary to many folks confusion, PageLatch_EX is NOT an I/O request. Rather, it occurs when a task is waiting on a latch for a buffer, and can cause contention of access to the in-memory pages. In other words, while page split operations are occurring, the latches will hold them in memory, and not release them until the operation is complete.
An index is made up of a set of pages (index nodes) that are organized in a type of structure that is hierarchical in nature. This is called a B-Tree. It is important to know this concept as latches are held on the page that is being split, the existing next page in the index, and the parent page in the index tree for the duration of the split. Any attempts to access any of these pages during the split are blocked, including read access. As the operation does not occur instantly, performance bottlenecks arise.
Now that we know excessive page splitting can show up as page latches, you can further explore counters such as Average Latch Wait Time (ms), Latch Waits per sec, Total Latch Wait Time and Super latches in this MSDN article on SQL Server Latches Object.
Using the SQL DMV that exposes the cumulative wait statistics, sys.dm_os_wait_stats, we can narrow down our culprit by looking for PAGELATCH_EX waits, which usually indicate heavy inserts, and with heavy inserts, page splitting can occur. With the help of Glenn Berry’s time trusted DMV Script for finding Top Wait Types on the system:
WITH Waits AS (SELECT wait_type, wait_time_ms / 1000. AS wait_time_s, 100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct, ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn FROM sys.dm_os_wait_stats WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK' ,'SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR', 'LOGMGR_QUEUE','CHECKPOINT_QUEUE' ,'REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_MANUAL_EVENT' ,'CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT' ,'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN')) SELECT W1.wait_type, CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s, CAST(W1.pct AS DECIMAL(12, 2)) AS pct, CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct HAVING SUM(W2.pct) - W1.pct < 95
If your system is not properly configured, such as setting the appropriate Fill Factor, (which we will talk about later in this series), page splits become a big headache. Therefore, you will need to continue your diagnosis.
If the top waits show up for PAGELATCH_* in the sys.dm_os_wait_stats view, you should further investigate by querying the sys.dm_os_waiting_tasks DMV. This view shows us what’s happening now on the system, and shows the queue of waiting tasks waiting on a resource. To search for a task waiting for PAGELATCH, you can run the following query:
SELECT session_id, wait_type, resource_description
WHERE wait_type LIKE 'PAGELATCH%
This will identify the associated session, and the resource_description column, that tells you the actual database_id, file_id, and page_id. Below is an image of sample output of the above query from an article on Page Latches by the SQLCat team. You can do a further deep dive of Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads, by clicking on their in-depth article. You can use their complex query to actually look up the page involved in the identified wait.
Furthermore, you can Cross Apply the sys.dm_os_waiting_tasks with other DMVs, such as dm_exec_requests, dm_exec_sessions, dm_exec_sql_text and dm_exec_query_plan to get the correlated SQL query and plan info.
select db_name(er.database_id) databasename, wt.wait_duration_ms, wt.wait_type, et.text, ep.query_plan, wt.session_id, exec_ssion.cpu_time, exec_ssion.memory_usage, exec_ssion.logical_reads, exec_ssion.total_elapsed_time, exec_ssion.program_name, exec_ssion.status from sys.dm_os_waiting_tasks wt inner join sys.dm_exec_requests er on wt.session_id = er.session_id inner join sys.dm_exec_sessions exec_ssion on exec_ssion.session_id = er.session_id cross apply sys.dm_exec_sql_text (er.sql_handle) et cross apply sys.dm_exec_query_plan (er.plan_handle) ep
Now that we took a look at how page-splits manifest themselves as a resource wait, and examined page latch contention, next time we will explore additional ways to pinpoint page splitting, the number of page splits and steps to reduce them. Stay bookmarked to the Pearl Knows Blog for the next entry in this Page-Splitting Mini-Series.
Please follow me on Twitter @Pearlknows, and check out our web-site for all our available services at http://www.pearlknows.com.
Take our HealthySQL Challenge! Are you SQL Servers healthy? How do you know for sure? Please contact us about our 15-point Health Check report, which will identify areas for improvement, and allow for best practice recommendations for your SQL Server(s). If we find NOTHING wrong with your SQL Server, the report is FREE! Contact us as firstname.lastname@example.org