I’ve recently encountered an issue that was difficult to resolve and I didn’t find the particular cause that was troubling us documented elsewhere on the web so thought I’d record it here.
The issue was with a service account connecting to SQL Server and intermittently failing to logon.
Errors reported in the Windows Application Event log were:
SSPI handshake failed with error code 0x8009030c
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.
The login attempt didn’t appear to get as far as the SQL instance, so no further information could be captured in a failed Logins trace.
This was affecting a large number of application servers using the same service account. Fortunately this was in development and test environments so no production impact.
The problem was that the account was getting locked out. A service was running every half hour using the account to connect to SQL, but with the wrong password. We also had a process running to unlock locked service accounts – so the account would start working again after a few minutes.
The resolution was to kill that service as it was no longer required. We were able to identify where the failed logins were coming from via the Active Directory audit logs for the account in question.
This was particularly difficult to troubleshoot as the error was a bit misleading.
There were a couple of other sources I came across while trying to google the cause of the problem. I’ve updated those with the cause we found as well:
Hopefully this post will save someone else the many hours of investigation I’ve just been through!