This month, Mickey Stuewe (b|t) hosted T-SQL Tuesday, giving us a topic of “data modeling gone wrong“. This is a fantastic topic – there are just so many ways that data modeling can go wrong, even by experienced pros – that I wanted to chime in, even though I wasn’t able to get a post written by the deadline for that event.
Almost every company (and every programmer) has certain conventions that they use when developing programs. For the most part, they serve to streamline the process and make it easier for teams to maintain each others’ code. Hungarian notation is frequently a big part of that. While you may not know it by that name, we’ve all seen it: it’s the prefix, often in lowercase letters, that programmers attach to the beginning of the name of an object to denote its type or function. A couple of examples that I use frequently are “fn” to denote a function or “wk” to denote a variable that I use to store intermediate, working values. Useful, right?
Well, most of the time. Just like any good thing, Hungarian notation can be taken too far. Suppose you have a column in your database that stores US zip codes:
intZip int NOT NULL
Looks easy enough. A US zip code is 5 or 9 numeric digits. We can deal with the dash that a 9-digit zip code uses by masking it when it’s output. No problem.
… Except when your company decides to start doing business in Canada. Canada uses 6-character alphanumeric postal codes. The misnomer of “zip code” versus “postal code” is easy enough to ignore, but that prefix says that the value is an integer. Now your schema has a column like this:
intZip varchar(9) NOT NULL
and a query might resemble this:
SELECT name FROM customers WHERE intZip = 'M4B1B3';
Hmmm… The prefix isn’t really helpful anymore, is it? It’s actually counterproductive like this: your code states that the value is an integer (people apply meaning to the name, after all – that’s why you used it), but the datatype is a varchar. You either have to live with it or you have to refactor every piece of code that contains a zip code. Most of us will let the discrepancy go at first. It’s only one column, after all. But think about where this leads. One field might not matter so much, but what about the next time a type change is required? And the next? And the next? We change code all the time.
I worked with a system several years ago that was in this state. They had used Hungarian notation to denote the type of the column, not realizing the inevitable consequences. It was a mess. Strings had become integers; integers had become strings; strings had become dates; and so on. They had comments everywhere to remind themselves of what the actual type of the column was, and one of the first steps in every project was to confirm the types of the columns they needed to work with. Every change took them far longer than it would have if they hadn’t used the prefixes to denote the type.
Prefixes can be quite useful. They help make our code self-documenting, and they allow us to more easily distinguish similarly-named objects within a namespace. I don’t code without them. But use them the right way: only use a prefix if you’re certain that the characteristic the prefix describes will never change. And
if when it does? Take the time to do the refactoring. Your coworkers will thank you later.