SSPI handshake failed error when one of several domain controllers is restarted

Question

Post reply

SSPI handshake failed error when one of several domain controllers is restarted

dturner-846477

Old Hand

Points: 328
More actions
April 21, 2020 at 7:57 pm

#3745121
The issue has occurred repeatedly on at least two different SQL 2016 servers (SP2 CU10 - Windows Server 2016 Datacenter ) when we restart one specific domain controller. We don't understand why the request doesn't simply get directed to one of the other two domain controllers located at the same site (Windows Server 2012 R2 - domain functional level: Windows Server 2008 R2).
We recently migrated the 3rd party application from SQL 2012 to 2016, and its clients from using Standard logins to Windows Authentication, with domain user accounts. The application's client servers and SQL Servers are all in one domain. The connections use NTLM, Netbios is disabled, and we are not using Kerberos.
From the SQL Error log:
Error: 17806, Severity: 20, State: 14.
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. [CLIENT: 172.31.31.53]
SSPI handshake failed with error code 0x80090304, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. The Local Security Authority cannot be contacted [CLIENT: 172.31.31.53]
Error: 18452, Severity: 14, State: 1.
- This topic was modified 5 years, 8 months ago by dturner-846477.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80411 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 2

I would suspect this is related to the NLA service - which would not refresh the connection to the domain controller until that server is restarted or the service is restarted. Because you restarted the domain controlled that server is connected to...it cannot connect to the DC to validate the login.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

sterling3721 SSChasing Mays Points: 658 More actions · Answer 3

I have the same issue when DC is patched and rebooted. Since it's monthly on a Sunday and only very short, I just ignore those. There are three DCs and supposed to be of high availability, but not to SQL server when connecting to the DC which is not online yet.

dturner-846477 Old Hand Points: 328 More actions · Answer 4

Thank you , Jeffrey. I will look into the behavior of the NLA service. Like sterling3721 ours occurs when DC patching. Unfortunately the 3rd party application isn't always tolerant of the failed connections. We now attempt to schedule around it and check the logs of the related SQL servers and work backward to find any issues.

I'm curious what folks thought of adding Kerberos to this scenario, from what I've read minimizing the number of trips to the DC ? Honestly I expected to see a lot more folks running into this issue but am now guessing we're the odd ones out by not using Kerberos.

sterling3721 SSChasing Mays Points: 658 More actions · Answer 5

To me, it happens to replication between publisher and subscriber. Due to authentication failure, connection to subscriber fails. Since replication handles network and other issues gracefully, including deadlocks, front end application is OK, no impact to users other than data taking a few more minutes to be updated.

Yes, I just checked, it's NTLM, not sure why it's not Kerberos.

maxx Newbie Points: 1 More actions · Answer 6

Any news on this topic?

I have the very same situation since the beginning of may, I haven't found any solution yet.