One-time DB job was scheduled at night to add two columns (computed, persistent) on a large table (5 GB together with indexes, witout compression about 10GB).
It is a compressed heap with several nonclustered indexes also compressed, partitioned by month with aligned partitions.
The command in the job was just "ALTER TABLE xy ..." that adds two columns.
The transaction lasted for several hours, and failed.
After that, the database was online, but most of users users got very odd errors every time:
Error: 615, Severity: 21, State: 1
Could not find database ID 6144, name '6144'. The database may be offline. Wait a few minutes and try again.
Some users could use the db normally, but most of the users got hundreds of that message, sometimes with different number as the "name" of the database. Such numeric names of DB of course do not exist and have never existed on the server.
DB was online and DBCC CHECKDB found no errors.
But almost none of the users could use the DB, so I had to fix it somehow.
Error log for the job reported this set of errors:
TCP Provider: An existing connection was forcibly closed by the remote host.
[SQLSTATE 08S01] (Error 10054)
Communication link failure [SQLSTATE 08S01] (Error 10054)
The transaction log for database '*DBNAME*' is full.
To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
[SQLSTATE 42000] (Error 9002)
The log for database '*DBNAME*' is not available.
Check the event log for related error messages.
Resolve any errors and restart the database.
[SQLSTATE HY000] (Error 9001)
During undoing of a logged operation in database '*DBNAME*', an error occurred at log record ID (388242:11656:483).
Typically, the specific failure is logged previously as an error in the Windows Event Log service.
Restore the database or file from a backup, or repair the database.
[SQLSTATE HY000] (Error 3314)
During a job execution, a Transaction log grew from 6 to 36GB. That is still not enough to fill-up the disc space, but not far from it (several GB were free).
So, I wonder why the "Transaction log full" error ?
After "transaction log full" striked, a huge rollback was performed. But that rollback also failed.
I ended up in a corrupted DB (although DBCC CHECKDB says it is ok, it is NOT) that is unrecoverable, except restoring it from the backup.
So, I did restore it to the point in time just before that ALTER TABLE started. Losing all the data changes and inputs after that point.
Now users can use it and looks fine, but I'm afraid that it can happen again if I don't find a reason why this severe corruption occurred.
So, please help. I have to find out. Because, I can't believe it is normal that one gets a corrupted db if transaction log fills the disk (and I don't believe that disk was totally full).
The point is - a simple ALTER TABLE ADD COLUMN on 5GB table generates 30+GB of transaction log, and unrecoverable corruption of the database.
So, I wont run that command again until I find the cause of corruption.
Is it really common in SQL server if you run out of disk space that you get corrupted DB ?
Version is SQL 2008R2 RTM on a 2-node cluster.
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths