Blog Post

Spaced-Out Strings

,

I came across this in a code review not long ago:

code_one

Three things pop into my head when I see this sort of thing:

  • That’s hard to read
  • That’s a whole lot of parentheses
  • Whoever wrote it is unfamiliar with the SQL standard about comparing strings

It’s the last item in that list I want to talk about first, because the SQL standard specifies rules for comparing padded and unpadded strings that make that RTRIM unnecessary.

For one thing, the LEN function could care less about trailing spaces. It ignores them. Pays no mind whatsoever. Just watch:

code_two

Yes, it’s true, LEN doesn’t care a fig for spaces.

Nor, for that matter, do comparisons for equality:

code_three

This is because the SQL standard says that, when comparing strings of unequal length, the shorter must be right-padded to the same length as the longer, and then the comparison is made.

In fact, about the only time it really matters if your strings are right-padded or not is when you are concatenating:

code_four

So, for the purposes of our predicate here, we can at least dispense with that RTRIM, leaving something slightly less parenthetical:

code_five

Now we can talk about LTRIM. My first question to any developer who uses LTRIM like this is, “What makes you think you have left-padded strings in the database?”

I won’t say that left-padded strings in a column are impossible: that would be silly. But I will say that they are very unlikely, and that in twenty years of writing SQL I’ve never encountered one, and that I’ll bet next month’s beer budget that there aren’t any in whatever table you’re querying (at least, none of the accidental variety). Wherever the original input came from: ETL from a flat file, XML, a Web front end, an app on your Aunt Fiona’s iPhone: any leading spaces should have long since gone the way of the dodo bird and the dinosaur. Most if not all of the app and ETL frameworks in use these days take care of this sort of thing behind the curtain.

If you’re the obsessively paranoid type, a quick query of the column in question should serve to restore your faith in inputs everywhere:

code_six

If you do happen to find leading spaces like this, it should be brought to the attention of whoever’s responsible for getting that data imported, because something is awry somewhere.

I think we can safely dispense with LTRIM, then, leaving us with a bit less typing and a couple less parentheses:

code_seven

Yet there’s still something unsatisfying about this, at least to me: it doesn’t make the logic behind the predicate clear.

If the column’s NULL, we replace that with an empty string (‘’), which then fails the test of having a length greater than 0, and eliminates that row from our data set. Likewise, any blank values will fail the LEN test and be eliminated as well. We therefore have two conditions to satisfy to return a row:

  • The value cannot be NULL
  • The value cannot be an empty string

We can rewrite this predicate to make that logic crystal clear:

code_eight

If you’re still nagged by the lingering fear that there’s a left-padded string out there somewhere, just waiting to blow up in production and ruin your day, there’s an easy fix for that:

code_nine

None of this is going to impact performance that much, if at all, but this isn’t about performance. It’s about clarity, and about understanding how string functions such as LEN work, and how they are governed by the SQL standard. Knowledge like this won’t necessarily make your code any faster, but in the long run, you’ll be a better developer for it, and those who follow you and maintain your code will thank you.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating