I wanted to post about a problem we recently experienced trying to upgrade a SQL 2005 Instance to SQL 2008, and a work-around for it. Both for the benefit of the user's on this forum, but more importantly so that this post gets out onto Google so that others won't have to go through what I did trying to fix it!
The short version:
Don't attempt to upgrade an existing SQL install to SQL 2008 with a renamed sa account. The upgrade will fail. Rename your sa account back to "sa", perform the upgrade, then you can re-rename it away from "sa" again.
If you've already tried to upgrade to SQL 2008 with a renamed sa account, it failed, and now your DB Engine won't start, see "The Workaround" below.
The long version:
What we were doing:
Upgrading a SQL 2005 Standard x64, upgrading to SQL 2008 Standard x64.
The upgrade install fails on the DB Engine with this error message:
"Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes."
The summary installer error log (C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\yyyymmdd_hhmmss\Summary_ _yyyymmddd_hhmmss.txt) showed:
[font="Courier New"]Detailed results:
Feature: Database Engine Services
Status: Failed: see logs for details
MSI status: Passed
Configuration status: Failed: see details below
Configuration error code: 0x4BDAF9BA@1306@24
Configuration error description: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.
Configuration log: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20080829_104530\Detail.txt[/font]
Not very useful :-S
The detailed installer log (C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\yyyymmdd_hhmmss\Summary_ _yyyymmddd_hhmmss.txt) showed - buried a long way down:
[font="Courier New"]2008-08-29 10:54:01 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing ConfigRC and scenario ConfigRC.
2008-08-29 10:54:01 Slp: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.
2008-08-29 10:54:01 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing ConfigRC and scenario ConfigRC.
2008-08-29 10:54:01 Slp: Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineConfigException: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.
2008-08-29 10:54:01 Slp: at Microsoft.SqlServer.Configuration.SqlEngine.SqlServerServiceBase.WaitSqlServerStart(Process processSql)
2008-08-29 10:54:01 Slp: at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineDBStartConfig.UpgradeSQLServerSystemDatabases(EffectiveProperties properties, Boolean isConfiguringTemplateDBs, Boolean fShutdown)
2008-08-29 10:54:01 Slp: at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineDBStartConfig.Upgrade_ConfigRC(Version fromVersion, EffectiveProperties properties, Boolean shutdownRightAway)
2008-08-29 10:54:01 Slp: at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineDBStartConfig.RepairFailedUpgrade(ConfigActionTiming timing, Dictionary`2 actionData)
2008-08-29 10:54:01 Slp: at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineDBStartConfig.Repair(ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcb)
2008-08-29 10:54:01 Slp: at Microsoft.SqlServer.Configuration.SqlConfigBase.SqlFeatureConfigBase.Execute(ConfigActionScenario scenario, ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcbCurrent)
2008-08-29 10:54:01 Slp: at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.ExecuteAction(String actionId)
2008-08-29 10:54:01 Slp: at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.Execute(String actionId, TextWriter errorStream)
2008-08-29 10:54:01 Slp: Exception: Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineConfigException.
2008-08-29 10:54:01 Slp: Source: Microsoft.SqlServer.Configuration.SqlServer_ConfigExtension.
2008-08-29 10:54:01 Slp: Message: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.[/font]
Again, not very useful.
Next step was the SQL Server error log - a copy of this is placed in the Setup folder the above installer logs were taken from - e.g. (C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\yyyymmdd_hhmmss\SQLServer_ERRORLOG_yyyy-mm-ddThh.mm.ss.txt). This showed, towards the very end:
[font="Courier New"]2008-08-29 10:54:01.05 spid7s Executing msdb.dbo.sp_syspolicy_create_purge_job
2008-08-29 10:54:01.14 spid7s Error: 515, Severity: 16, State: 2.
2008-08-29 10:54:01.14 spid7s Cannot insert the value NULL into column 'owner_sid', table 'msdb.dbo.sysjobs'; column does not allow nulls. INSERT fails.
2008-08-29 10:54:01.16 spid7s Error: 912, Severity: 21, State: 2.
2008-08-29 10:54:01.16 spid7s Script level upgrade for database 'master' failed because upgrade step 'sqlagent100_msdb_upgrade.sql' encountered error 515, state 2, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
2008-08-29 10:54:01.19 spid7s Error: 3417, Severity: 21, State: 3.
2008-08-29 10:54:01.19 spid7s Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.
2008-08-29 10:54:01.20 spid7s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.[/font]
So now I knew vaguely what the problem was - the msdb.sp_syspolicy_create_purge_job stored proc failed, due to an attempt to insert a null SID into the owner_sid column of the msdb.sysjobs table.
A bit of digging lead me to the SQL 2008 install/upgrade script that is being executed when the error above occurred - E:\Microsoft SQL Server\MSSQL10. \MSSQL\Upgrade\sqlagent100_msdb_upgrade.sql. Another copy also lives in ...MSSQL\Install\.
Inside this script, the stored proc that failed (sp_syspolicy_create_purge_job) is created. It starts on line 45301.
Examining this stored proc, the only part of it that tries to insert anything into the sysjobs table is a call is made to msdb.dbo.sp_add_job, starting on line 45332:
[font="Courier New"]DECLARE @jobId BINARY(16);
EXEC @ReturnCode = msdb.dbo.sp_add_job
@job_id = @jobId OUTPUT;[/font]
And there was our problem.
MS have hard-coded the name of the sa account into this stored procedure call (@owner_login_name=N'sa'), but we had renamed our sa account to something else. So, when sp_add_job was called, it tries to get the SID of the user with username "sa", which returns null as no user named "sa" exists. It then tries to insert this null SID into the sysjobs table, which causes the original error, because the owner_sid column of sysjobs can't be null.
I should note that any subsequent attempt to start the DB engine then fails, as it attempts to re-run the same set of upgrade scripts, which again fail and cause the DB engine to shut-down.
The obvious work-around is to temporarily rename the sa account back to "sa", and then run a repair install. However, you can't rename sa, as you can't start the DB engine.
Next I tried hacking the install/upgrade scripts referenced above to manually insert the name of our sa user instead of "sa". This also didn't work as SQL seemed to ignore the change - obviously it is running these scripts from somewhere other than the filesystem (resource DB maybe?).
I was almost at the point of nuking the install and starting again when I had one final idea:
I created a domain user with the username "sa", and added this account to our Database Administrators group (should also work if you add this account to any group with sysadmin priviledges on your SQL server).
I then started up the DB Service and... viola, upgrade scripts executed without a hitch, and the service started! My faux "sa" user was good enough for the scripts to execute just fine.
Final step was to delete this domain user again, rename the "real" sa account back to "sa", and then run a repair install (to be safe, and also to fix a few things that got broken by the interrupted original setup).
Once that was all done, everything appears to be normal again.
I hope that helps someone out there - it took me over a day to get this fixed!