Who the Devil Wrote This SQL Code?

The way that you format T-SQL code can affect the productivity of the people who have to subsequently maintain your work. It is never a good experience to see SQL Code, cry out “Who the devil wrote this code?”, and then realise that it was you. Grant gives some examples of bad formatting and explains why you should never check-in badly-formatted SQL code.

smalldevil

Have you ever had that moment when you open up some T-SQL code and you involuntarily shout out, “Who the devil wrote this SQL code?” It’s that moment when you spot code that’s not just visually wrong, such as in putting commas in front of column names, but wrong because of the mistakes that people are going to make because of all the difficulty in reading, understanding and editing that code. There are horrific things that people do to their T-SQL that absolutely shouldn’t happen, let’s talk about them and the problems that they cause.

Alphabet Soup

We’ll look first at three, very simplified, examples of what I call alphabet soup. It basically consists of aliasing the tables using the alphabet and the order in which they were referenced: It is a strange habit:

You are going to cause all sorts of problems if you write code like this. First off, if you really do have these three queries within the same stored procedure, how hard will it be to confuse which is table ‘a’ in each of the queries when you go back to edit them? Pretty easy.

It gets worse though. I know that none of us will ever write a query that exceeds 26 tables in a JOIN… well, except that one time… and that other time. In fact, it happens. Oh, it’s not always a good thing, but it’s a thing. How do we respond to that? I’ve seen this:

Yeah, turn on case-sensitivity and start referring to tables ‘a’ and ‘A’. What could ever go wrong there? What happens when you exceed 52 tables? Or when people do this:

Yeah, because table ‘aaa’ is going to be so easy to discern from ‘aa’ and ‘a’. At least we don’t have to worry about running out of letters, ‘aaaaaaa’ table is so clear.

This approach does nothing but cause confusion and a lack of clarity. I absolutely believe that we must use aliases, if for no other reason, I’m lazy and I want to be able to type less while still referring to object owners. Instead of alphabet soup, let’s name aliases in a meaningful fashion:

Not only is it easy to discern which table is which in each of the queries, but the consistency makes the code that much more readable. You will have fewer errors and better understanding of what the code is doing.

Object Owners

While we’re on the subject, there is no excuse for code like this:

There are a couple of problems here, and they will cause you lots of headaches. First off, which table does OrderDate belong to? Can’t tell, can you? Next, why doesn’t Application.People have an alias? Because it doesn’t need one. See, they only put aliases on tables that have columns that might repeat in the query. For example, many of the tables have a LastEditedWhen column, so it needs an owner to distinguish it. The Same thing goes for PurchaseOrderID which is in more than one table, so they supply aliases or, in some cases not use aliases but completely retype the column name.

This kind of inconsistency makes lots of retyping and re-editing necessary. As soon as you add in the LastEditedWhen column from the People table, you’ll have to either add an alias to that table or supply the full table name where that column is referenced. Either way, you’re going to be doing a lot more typing to get your code to work correctly. Let’s also note that we’re adding a little bit of work to SQL Server because it will have to go and find each of the objects that don’t have specific owners. Yes, that amount of work is vanishingly miniscule, but it’s extra work that it could be doing on your queries.

Let’s just do this:

With over 10,000 executions of each query on my system, the average for the query without object ownership was 129ms while the average for the query with object ownership was 115ms. The difference was the work required to discern object owners.

Statement Terminators

Oh, I get it. Semi-colons are syntactic sugar. Sure they are. Well, as long as you don’t count the fact that more and more statements are now requiring sugary icing in order to even function. What am I talking about? This horror:

The semi-colon is a statement terminator. Terminator, the end of something, like, I don’t know, Sarah Connor. No, I mean the end of the statement. So why is it going at the start of the statement above? Because people kept getting this error:

Rather than change their code to start using terminators, they threw it at the front of the WITH statement. Heck, I’ve seen internal documents for companies that have declared that a coding standard. Since this is only for a CTE, what the heck, we don’t really have to worry about just doing that, right? Put the semi-colon just in front of the WITH statement and we’re all good… as long as we’re not using Service Broker, because those commands require it too. Oh who cares, no one is using Service Broker. Just so long as you don’t need to ENABLE/DISABLE a trigger, ever. Oh, and use the THROW command or MERGE…

Microsoft is requiring the semi-colon in more and more statements. You’re going to hit issues if you don’t use it as a statement-terminator in all your statements. Yes, older code can still work, but as you add newer functions, you’re going to hit the issue. Eventually they’re going to require the terminator for all statements, just start writing it that way now and avoid the issue.

Single Line Queries

I see these all the time and I just wonder, how on earth people manage to get their tasks done when they’re looking at this:

Seriously? It’s a single line in SSMS. It’s a tiny query and it’s practically unreadable. No wonder people have a hard time query tuning, they don’t know what they’re looking at. It’s all one amorphous blob. Approaching coding like this will absolutely slow you down and lead to errors. You can’t help but mess up in a situation like this. For example, did you notice that there is a missing object owner up there? I put it near the end so you might be able to catch it. Imagine how much harder that would be if it was somewhere in the middle of that mess.

What should we use instead? Line breaks. I actually don’t care where, that much, but somewhere. Please. There are multiple studies on how white space enables readability and provides a mechanism for focus. A consistent set of line breaks and maybe some spaces or tabs will provide a common structure that you can easily communicate to your team, making all your code that much easier to read, making it easier to avoid errors or fix them.

Too Much White Space

On the other hand, it’s entirely possible to have too much of a good thing. It’s like the development team that was told a query hint would help this one query, this one time, and all they heard was that this query hint made queries run faster, so they put it everywhere. Others hear that white space makes things more readable, so we end up stuff like this:

At least it has a semi-colon.

I’m using a screen capture on this one so you can see how line 12 shoots out to character 119 on the far side of the screen. Thankfully this is a very small and simple query otherwise I’d be scrolling all over creation to try to read it. This suffers from the same problems as the previous example, but instead of no white space, this is ALL THE WHITE SPACE. Not only is this query hard to read, and therefore hard to understand, you can easily think you’ve moved between queries at a certain point.

Abnormal Characters

When you crack open a database and you see this, do you let slip some expletives before you regain control, causing yet another trip to HR, because it sure does happen to me:

Oh yeah, please, this is just what I need to see on a Friday afternoon when there’s a performance problem in a query. We have a mixed case of restricted characters, the space in the name of the table, [Table Name], reserved words, IDENTITY, SELECT, NOT, NULL, and, oh, just to add insult to injury, we’re using quotation marks around the object name so that the name can contain brackets.

From this point forward, we’re in hell. Some of you actually like using brackets around all the object names in T-SQL. Heck, the SSMS scripting tool does it by default (because, you guys and your poor choices force it to). However, admit it, you don’t want to look at this any more than I do:

Seriously, you want to type [p] over and over instead of just p? And you’re OK that you ALIAS to po, but then you have to reference it as [po]? The clutter is insane, but the inconsistency is worse. That’s before we get to the issues that can be caused by using the quotes around, not only reserved words, but special characters such as brackets. Now, we can’t even have a consistent use of brackets, but instead are forced to write our SELECT statement against the [Table Name] table like this:

Even if we wanted to use brackets, we can’t. This will end in tears, not counting what I’m going to do. What do we do instead? Follow the rules for Identifiers as laid out by Microsoft. Just because you can do something, doesn’t mean you should. As your mom always said, “Just because all your friends are using non-standard characters in T-SQL are you going to get shoved off that cliff by Grant too?”

Conclusion

I understand. You’re in a hurry. The niceties of coding well are not necessarily always going to be observed because… No, actually I don’t understand. There are simple and clear standards out there. Following them doesn’t add time, especially if you use a tool like SQL Prompt to help you do it. In fact, following them makes for greater clarity and understanding, which speeds things up. Avoiding errors and eliminating non-standard approaches is a performance enhancing mechanism. So, no, I don’t understand. You’re in a hurry, so do it right the first time. 

And put your commas at the end of the line like a good pagan.

References

  1. Cleaning up common T-SQL coding issues with SQL Prompt: Grant Fritchey
  2. Formatting SQL Code – Part the First: Joe Celko
  3. Formatting SQL Code – Part the Second: Joe Celko
  4. Transact-SQL Formatting Standards (Coding Styles): Robert Sheldon
  5. SQL Code Layout and Beautification: William Brewer
  6. Laying Out SQL Code: Phil Factor
  7. How Should You Lay Our Code: Tony Davis
  8. T-SQL Formatting: A Sure-Fire Way to Start a Conversation: Jonathan Allen