Advanced Troubleshooting Week at SQL University, Lesson 3


Advanced Troubleshooting Week at SQL University, Lesson 3

Welcome back to Advanced Troubleshooting Week at SQL University. I’m your guest professor for this course and today I will once again be talking about Advanced Troubleshooting. Nargles ... dispatched. Grindylows ... defeated. Boggarts ... well, we'll see. We'll see. With a little guidance, I think everyone will do just fine.

Boggarts are a particularly nasty breed of creature. They represent what we fear the most. For the average DBA, what we fear the most is Kerberos errors. Why do we fear kerberos errors so much? Opinions vary, but i think it can be summed up in three reasons.

  • There are no logs or DMV's or counters that we can check to determine the root cause
  • There are no logs or DMV's or counters that we can check to determine the root cause
  • There are no logs or DMV's or counters that we can check to determine the root cause

Okay, so maybe that's just one reason. I really do think it's as simple as there is no easy, well documented way to determine the the cause. There are some troubleshooting methods you can employ though. I'm going to give you some tools to help you troubleshoot and resolve kerberos errors.

And be sure to take a look at the earlier lessons:

Double-hop Scenario


Kerberos is an authentication protocol that is more secure that NTLM and more problematic. Kerberos uses the Windows SSPI (Security Support Provider Interface) to perform a series of checks to authenticate the user and to validate the security principals on each end. If anything fails at any point, kerberos authentication fails. It is recommended to use kerberos whenever possible, and anytime a connection to SQL Server is attmepted, kerberos authentication is attempted first. If kerberos fails, SQL Server falls back to Windows NTLM  (NT Challenge/Response) authentication.

The number 1 reason kerberos authentication is used in SQL Server is not for security. Most administrators configure kerberos to be able to authenticate over more than a single hop. The dreaded double-hop scenario is brought up in forums and discussion groups time and time again. The easy, less secure work-around is to use SQL Authentication so that each hop authenticates individually from point to point. Kerberos is more secure, in part, because it authenticates the same end user over every point in the authentication chain. Furthermore, in order to authenticate over multiple hops, a special privilege in Active Directoy called delegation

When you successfully establish a connection using kerberos, a kerberos ticket is generated by a KDC (key distribution center). The KDC acts as a third party negotiator that validates the identity of both security principals and issues authentication tickets and session keys enabling a secure connection between the two parties. Delegation is the privilege to relay a user's kerberos ticket for authentication across multiple hops without the need to know the user's credentials. Since the user's crednetials are not being relayed, there is no threat from a middleman attack where they intercep the authentication packet. If the authentication packet is intercepted and somehow decrypted, there are no re-usable credential information in there anyway.

Configuring Kerberos and Constrained Delegation

I've rambled on enough about what kerberos is. I want to get into actual problems and how to fix them, but there's one more area I must cover. The reason I'm going into all this information before-hand is that I feel kerberos, like boggarts, are easiest to handle when you understand what it is. One of your key tools for resolving kerberos issue is simply checking the configuration to make sure it is configured correctly. You'll need to understand how to configure kerberos and constrained delegation for setting it up and for figuring out why it doesn't work when it doesn't. In simple terms, here is what you need to know for setting up kerberos with constrained delegation:

  • All SQL Servers involved must have valid SPN's
    • SPN must be owned by the SQL Server service account
    • SPN must match the servername being used to connect to the server
      • if using the FQDN, an SPN must exist for the FQDN
      • if using ServerName without FQDN, an SPN must exist without the FQDN
      • If using an instance name not on the default port, the port must be specified
      • If using a virtual name, an SPN must exist for the virtual name
    • See Lesson 1 of this series for tips on troubleshooting invalid SPN's
  • All SQL Servers involved and their service accounts must be in the same domain
    • Kerberos cannot be configured cross domain
  • The end user's account must not be marked as sensitive, unable to be delegated
  • The relaying SQL Server's service acocunt must be granted privileges in AD to delegate credentials

When you experience kerberos failures, you typically won't realize it unless you try to perform a second hop. Generally, if kerberos fails, SLQ Server silently falls back to NTLM without raising an error or warning. NTLM does not allow credentials to be relayed, so when a double-hop is attempted, the credentials do not get relayed and the target SQL Server sees the user as anonymous. The following error will be seen in the SQL log of the relaying server if this occurs:

Error: 18456, Severity: 14, State: 11.
Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. [CLIENT: <end user's IP address>]

Troubleshooting Kerberos

If you are getting this error, start by validating the configuration. Check the SPN's and validate magnitude of the problem. You can query the DMV sys.dm_exec_connections

to determine if users are conencting via kerberos, NTLM, or a mixture of both. If some users are able to connect via kerberos while others are not, my first suspicion would be that it is an SPN issue. For example, if it is working for users connecting to the FQDN of the server and not for users connecting without the FQDN, then there is probably only a valid SPN for the server with the FQDN. If the SPN exists but is invalid, then it will result in a hard error generating the SSPI context when first attempting to login (again, see Lesson 1 for this error). If you're not getting an SSPI context error, and you think the SPN's are correct, double-check them. Make sure everything is spelled correctly and make sure the port number is specified if SQL Server is using not the default port number. Though the port number is not required when connecting to the defualt port (1433), I recommend that you always specify the port number to avoid ambiguity.

Here is a query that I like to use to check to see if users are able to connect via kerberos:

Select session_id As SPID,



From sys.dm_exec_connections

Where net_transport = 'TCP'

You can verify if one version of the servername works while another doesn't by connecting to the server in separate sessions using both the FQDN and not using it. In each connection, check sys.dm_exec_connections

for your specific conneciton to see what authentication scheme they each use. If one connects via kerberos and the other doesn't, then you know which one likely does not have a valid SPN defined. I like to use the following query to check the authentication schema of my current connection:

Select auth_scheme

From sys.dm_exec_connections

Where session_id = @@SPID

If you've validated that all SPN's are correct, and the SQL Servers involved all meet the configuration criteria defined above, then you will need to take a look at the AD permissions. Most likely this will require contacting the person or team that manages AD for your domain. If you do have permission to do this yourself, then you would need to log into a domain controller and check the computer account of all servers involved and make sure they are trusted for delegation. Then check the user account of the SQL Server service account and make sure that it has the property Account is Trusted for Delegation is checked. Also make sure none of the accounts involved have the property Account is sensitive and cannot be delegated checked. The accounts and settings can be managed using Active Directory Users and Computers which can be found under Administrative Tools or by adding the snap-in to MMC.

Tools for Troubleshooting Kerberos

If you can't find a configuration problem and the kerberos issue persists, there are some tools you can use to dig a little deeper:

Kerbtray: Kerbtray is an executable that can be used to see and purge kerberos tickets you have recieved

Useful links for Kerbtray:

 Wireshark: Wireshark is a network protocol analyzer. Wireshark has a kerberos disector that can decrypt kerberos tickets if you have sufficient privileges to create a keytab file.

More information on Wireshark's kerberos disector:

LogMan: LogMan is Windows Log Manager. You can use LogMan to start and stop traces to a log file such as tracing performance counters. LogMan can trace and log a variety of different types of information including Windows etl (event trace binary log) traces of the kerberos and NTLM protocols. When LogMan creates an etl trace of kerberos or NTLM protocols, it writes a binary file that needs to be parsed out before it is readable. Microsoft Log Parser 2.2 is able to read in an etl file and write it out to a text file.

If the command below returns an error saying that the provider is invalid, your version of the providers may be different. I have only encountered this once. You can query for a list of providers using LogMan and use the correct GUID for the providers. The list of providers is pretty long so it is easiest if you output the list to a text file. The trace should be executed from a client machine where the failure can be reproduced. Simply start the trace, reproduce the error, and stop the trace.

When you have the completed logs, install Microsoft Log Parser 2.2 if you don't already have it installed. Be sure to include the sample scripts as part of the the install. There is already a ready to go script in the sample files for parsing an etl file to a text file. Simply call that script with the script to be parsed as an argument to the script. If any errors were encountered when connecting via kerberos or NTLM, they will be in the final output files in plain text.

Query for a list of providers:

logman query providers > c:\providers.txt

Start kerberos and NTLM traces:

logman start ntlm -p {5BBB6C18-AA45-49b1-A15F-085F7ED0AA90} 0x15003 -ct perf -o c:\traces\ntlm.etl -ets
logman start kerb -p {6B510852-3583-4e2d-AFFE-A67F9F223438} 0x43 -ct perf -o c:\traces\kerb.etl -ets

Stop the traces:

logman stop ntlm -ets
logman stop kerb -ets

Parse the etl file:

The path to the script may vary for each install.

pushd "C:\Program Files (x86)\Log Parser 2.2\Samples\Scripts"
DumpTraceReqs.js c:\traces\kerb.etl > c:\traces\kerb.txt
DumpTraceReqs.js c:\traces\ntlm.etl > c:\traces\ntlm.txt

Useful links for LogMan/Log Parser:

One more for the road

It's been a really busy week, and the end of the week came before I covered all of the topics I wanted to cover. I am going to post one more topic on this subject even if it does mean that Advanced Troubleshooting Week runs over into the next week. I think there is one more topic for troubleshooting that warrants coverage this week: mistrals. Mistrals are not bad creatures. In fact they are extremely helpful. They generally go completely unnoticed until some sort of trouble arises. Mistrals are misunderstood and once you take notice of a mistral, it is often scary. But with a little understanding and loving care, you can grow to be very fond of mistrals and how helpful a mistral can be. You may even wonder how you ever got along without knowing about it before.

The mistrals I speak of are the tempDB, of course. Most of us don't really think about tempDB until it causes us performance problems. I'm going to show you how to properly care for your tempDB and how to make it feel better when it's not feeling its best.