SQL Server is designed to provide high reliability and data integrity. However, like any software, it can also face I/O-related issues because it interacts with the storage subsystem. The most critical errors we might face are 823 and 824. These errors indicate the issues with disk I/O, data corruption, or hardware instability. As a database administrator, it is very important to understand the meaning of these errors and how to fix it.
In this article, we’ll understand each of these error codes, what their error message means, their common root causes, and how to troubleshoot them.
First, let us understand the errors codes and their root causes.
Error 823: I/O Error Detected
The error code 823 occurs when SQL Server finds a low level I/O problems while writing the data to the disk. The low-level error occurs at a windows operating system or storage level error. You might see following text when the error occurs
Error: 823, Severity: 24, State: 2. 2010-03-06 22:41:19.55 spid58 The operating system returned error 1117 (The request could not be performed because of an I/O device error.) to SQL Server during a read at offset 0x0000002d460000 in file 'e:\program files\Microsoft SQL Server\mssql\data\Stackoverflow2010.MDF'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe, system-level error condition that threatens database integrity and must be corrected immediately. It is recommended to complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
In simple words, when windows operating system tries to access the data, the process fails due to disk IO timeout, bad sectors or storage controller failures. In such situation, the SQL Server raises Error 823. Here is the list of common issues due to which error 823 occurs.
- Hardware (disk, storage controller, or SAN) failure.
- Corrupted sectors on disk.
- Operating system or driver issues.
- Power outage which might interrupt the writing.
Now, let us see how we can diagnose the issue.
How to Diagnose the Error 823
Here is the list of ways that can be used to diagnose error 823.
- Review the SQL Server error logs or windows event viewer for detailed message and IO alerts
- Regularly run the consistency check. The DBCC CheckDB will report the error if it finds the database corruption.
- The storage manufacturers provide their own tools and software that can be used to monitor and validate the storage health.
- Periodically run the CHKDSK command to identify the bad sectors.
Recommended actions
After finding the error, you can perform following activities to resolve the issue.
- If the database is accessible, take the backup. The backup might be used to restore the data which is inaccessible.
- If database is inaccessible, you must restore the most recent backup.
- You can engage the storage administrator to check the disk health.
Now, let us understand when error number 824 occurs.
Error 824: Logical I/O Error
Error 824 error occurs when the content of the SQL Server pages fails the logical integrity checks. In SQL Server the logical integrity checks are part of DBCC CheckDB and DBCC CheckTable. When we run a logical integrity check, SQL Server will validate the logical relationships between tables and indexes, index keys and data rows, and page linkages and their hierarchy order.
SQL Server reports 824 logical IO errors when it encounters an issue like checksum mismatch, torn page detected or incorrect page IDs.
The SQL Server can encounter error 824 due to following reasons
- Torn page detected which might have occurred due to incomplete write.
- The storage controller has any caching issues.
- Any memory module fails or gets corrupted before writing data from memory to disk.
When any logical I/O error occurs, you might see following text in error log.
2022-11-02 15:46:42.90 spid51 Error: 824, Severity: 24, State: 2. 2022-11-02 15:46:42.90 spid51 SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:43686; actual 0:0). It occurred during a read of page (1:43686) in database ID 23 at offset 0x0000001554c000 in file 'H:\MSSQL16.MSSQLSERVER\MSSQL\DATA\Stackoverflow2010.mdf'. Additional messages in the SQL Server error log or operating system error log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see https://go.microsoft.com/fwlink/?linkid=2252374.
How to Diagnose the Error 824
When you encounter or notice the error, you can further diagnose the error by any of the following ways.
If the error message contains the page ID of the affected page, you can inspect the page content using DBCC PAGE command. Here is the syntax.
DBCC TRACEON(3604);
DBCC PAGE('StackOverflow2010', 1, 2456, 3);If the page or database is inaccessible due to the 824 error, you can check the database integrity issue. It can be done using DBCC CheckDB command. Here is the syntax.
DBCC CHECKDB ('StackOverflow2010') WITH NO_INFOMSGS, ALL_ERRORMSGS;Check SQL Server error logs for related messages. You can access the error logs via SQL Server management studio, or you can use xp_readerrorlog stored procedure. Here is the command.
EXEC xp_readerrorlog 0, 1, '824';
Now, let us go through some of the best practices that we can implement in our system so that we can prevent I/O related critical errors.
Best practices for preventing the I/O related errors
We can implement following processes to prevent the I/O related errors.
- Enable Page Checksum option on database. When you enable the page checksum, the SQL Server validates each data page while reading them from the disk. You can enable the option by running following query.
ALTER DATABASE StackOverflow2010 SET PAGE_VERIFY CHECKSUM;
- Schedule database consistency checks jobs to monitor the database integrity. You can read Automate the SQL Server DBCC CheckDB command using Maintenance Plans article to learn more about database consistency check jobs and how to schedule them.
- Always backup your database and keep them on reliable storage or cloud storage. It is preferable to keep multiple copies of backups. For example, keep the SQL Server backups on local storage server and another copy on Azure Storage.
- Always use ECC memory (Error correcting code memory) and RAID 10 configurations. It protects the SQL Server from memory and disk level corruption.
- Always keep the divers and firmware of storage sub-systems up to date. Sometimes the outdated drivers cause I/O inconsistencies.
How to fix the I/O issues
If you encounter any of the above errors, the most convenient and safe way to fix the issue is to rebuild the database by restoring the latest backup. The process of restoring database using native back is straight forward. First, we need to restore a full back up, then latest differential backup and lastly restore all log backups taken after the differential backup. The actual problem comes when your backups are corrupted, recovery chain is broken or there are no backups at all.
In such cases the recovery of data becomes complicated and there are high chances of major data loss. Now, there are tools like Stellar Repair for MS SQL Server that might help us to recover the corrupt data and help us to bring database online.
Conclusion
Error 823 and 824 are not just routine casual errors. These errors are critical warning which indicates that the storage on which you are storing the precious data is no longer reliable. As a database administrator, it is our duty to proactively monitor the storage sub-system errors and try to rectify them before it becomes a disaster. In this article, I have explained the most important IO errors. I have explained how to identify them and what are the potential root causes of it. I have also covered the best practices that we can implement in our system to prevent such errors and also reviewed how we can fix these issues.