I’ve grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It’s a fairly human look at what’s involved in sailing on a Trident missile submarine…
Debugging is a life skill. Yes, the term comes from IT (literally removing a bug from a computer believe it or not) but the techniques are used all through life.
- Why won’t my phone charge?
- Why won’t the living room light turn on?
- Why hasn’t that water boiled yet?!?
Many people consider debugging an art. And to be fair, there is a lot of art in it. Although actually, it’s just experience in knowing where you are most likely to find a problem. And of course, everyone’s experience is a bit different so everyone starts in slightly different places. There are, however, certain very simple principals that everyone should know and apply when debugging. I’m honestly surprised (and a little freaked out) when I work with people and they don’t do it this way.
Note: Yes, I said this is a life skill, and the following applies just as much to Why won’t my phone charge? as Why won’t this stored procedure run?. However, in case you hadn’t noticed, this is a SQL (mostly) blog. So I’m going to go at this from a heavily IT slant.
Simplify, simplify, simplify!
Almost everything is made up of smaller pieces. When you are running across a problem, (well, after you check the stupid things) start breaking up what you are doing into smaller pieces. I’m going to give a simplified version of an actual example of something I did recently. No code examples (sorry), just discussion, and only most of the tests so you can get the idea.
I’m running a job that runs a bat file that uses SQLCMD to run a stored procedure that references a linked server. I’m getting a double hop error.
Breaking it down
If you think about it, this is a pretty complicated process with a bunch of layers.
- A job
- Calls a bat file
- Calls SQLCMD
- Connects to a SQL Server
- And runs a stored procedure
- That queries a linked server
And some of these layers (the stored procedure for example) have layers of their own.
Peel it like an onion
Each of those layers could be the source of the problem. So starting at one direction or the other begin peeling off layers. I tend to start at the bottom and work my way up.
- Query against the linked server in SSMS (not SQLCMD). Something like
SELECT * FROM [linked server name].master.sys.databases;
If this failed, then I have a problem with the linked server (probably not set up for Kerberos)
- Run the stored procedure from SSMS (still not SQLCMD).
If this fails, open up the SP and take a look at the actual query failing. Start breaking it down.
- Use SQLCMD to run that first query.
If this fails then you have a problem with SQLCMD & Kerberos.
- Use SQLCMD to run the stored procedure.
If this fails then you probably still have a problem with SQLCMD & Kerberos but you need to open up the SP and try running the actual query that failed from SQLCMD on its own.
- Open a command shell and run the bat file manually.
If this fails, what ID is the job using to run the bat file? Where is it running the bat file from? Are full URL paths involved? etc.
You see that each of these steps tries to isolate one or more pieces of the problem. Once that piece is isolated we can begin to break things down more. Run tests etc. But with a complicated situation trying to debug it in mass just isn’t possible. Or at least it’s extremely complex and difficult.
Now all of that said and you are going to get someone who is going to see an error, hear a description of a complex problem and go Here is your solution. Experience will do that for you :).