Time Bomb Design - A Longer Fuse

  • To simply state "That is the problem of the developer" sets up an adversarial relationship.

    On the contrary, using "TableNameId" for the PK column is harder for the developer not easier IMO. I'm coming at this from both the DBA perspective and the developer perspective as I've done and do both. Consistency is the developer's friend. It is much easier to code against a table with a consistent naming approach then to have one naming approach for surrogate PKs and another for all other columns.

    So you use "E" for an alias so you have "E.Id" instead of E.EmployeeId". Why is the first form more difficult than say "E.Name" instead of "E.EmployeeName"? Why not prefix the table name on all columns instead of the just a surrogate primary key? I have no issue with table aliases. I use them all the time; even single letter ones. However, if the problem is that developers confuse themselves by choosing poor table aliases, that is problem with the developer not the database schema. If the developer cannot remember what "E.Name" represents because they forgot what they aliases as "E", that is a developer problem.

    I disagree. DRI is nice when it exists, but nothing beats a database diagram and notes from the designer explaining what the fields are and what the relationships are.

    If DRI does not exist there are much, much, much larger issues at hand. Having DRI is absolutely crucial to even a passable database design. While a schema diagram is also useful, that kind of documentation may or may not be current with the actual database schema.

    Having notes from the designer about what the fields mean is entirely orthogonal the issue of column naming. What you are describing is a database dictionary and I entirely agree that such a document is useful. Still, I would find an entry for "Employee, Id" to be as clear as "Employee, EmployeeId".

    Simply naming the field ID can lead to confusion in a complex query where you have 5 tables each with a column named ID. Developers are going to use aliases whether you like it or not and they're going to end up with bad queries if you've created a confusing structure.

    Again, what you are suggesting is a poorly written query where the developer uses the wrong alias (A.Id instead of B.Id). How does that improve clarity in the scenario where say "EmployeeId" exists in five or ten tables? It will be just as easy to mistakenly write A.EmployeeId instead of B.EmployeeId as it would writing A.Id vs B.Id.

    Developers should be writing code for other people to read. If they use a bunch of poorly chosen aliases which makes it harder for other developers to read their code, that is a separate, albeit important, issue. Code reviews and automated syntax checkers would help in that situation.

    The root issue seems to be that you want the database designer to account for the possibility of developers writing queries with many tables and obtuse aliases. If that is justification, I'm not sure why that justification doesn't apply to all the other columns.

    In general, IMO, this issue is a pedantic one but does go to the root of consistent rules for naming. Putting the name of the entity to which an attribute belongs in its name is redundant IMO whether that column is the PK or just another column. It seems silly to make an exception for a PK. I've managed databases and written code against both styles and I found that simply using "Id" for a surrogate PK was more flexible, simpler and easier to code against than to have the table name prefixed.

  • Stefan Krzywicki (6/8/2010)


    Why stop at the primary key column? Because ID is likely the only column that's going to be in every table. You're very unlikely to have lots of "Phone" columns or "MaritalStatus" columns. If you do, you're likely designing badly.

    Don't be so ridiculous. You know perfectly well that the correct design would be to use:

    MaritalStatus.ID, MaritalStatus.Name

    and Phone.ID, Phone.Number

    I think you are deliberately trolling the totally on the ball and hugely sensible patterns of Thomas.

    :hehe:

  • Thomas-282729 (6/8/2010)


    Again, what you are suggesting is a poorly written query where the developer uses the wrong alias (A.Id instead of B.Id). How does that improve clarity in the scenario where say "EmployeeId" exists in five or ten tables? It will be just as easy to mistakenly write A.EmployeeId instead of B.EmployeeId as it would writing A.Id vs B.Id.

    Well, in this case the difference between getting a.EmployeeId instead of b.EmployeeId is likely to be less significant than getting

    a.ID when you wanted b.ID where a.ID comes from Employee a ([employee]Id, name, stuff) and b.ID comes from ProjectStaff([project]Id, employeeid, stuff)

    Also - that bug is almost impossible to find, whereas with properly named columns its pretty straightforward.

  • Consider the difference between

    SELECT E.EmployeeID, E.LastName, E.FirstName, D.DepartmentID, D.Department,

    P.Project, P.ProjectID, PD.ProjectDetailID, PD.ProjectDetail PD.Comments

    FROM Employee E

    INNER JOIN Department D

    ON E.DepartmentID = D.DepartmentID

    INER JOIN ProjectInterim PRI

    ON PRI.EmployeeID = E.EmployeeID

    INNER JOIN Project P

    ON P.ProjectID = PRI.ProjectID

    INNER JOIN ProjectDetail PD

    ON PD.ProjectID = P.ProjectID

    [/CODE]

    AND

    [Code="sql"]

    SELECT E.ID, E.LastName, E.FirstName, D.ID, D.Department, P.Project, P.ID, PD.ID,

    PD.ProjectDetail PD.Comments

    FROM Employee E

    INNER JOIN Department D

    ON E.AssignmentID = D.ID

    INER JOIN ProjectInterim PRI

    ON PRI.ResourceID = E.ID

    INNER JOIN Project P

    ON P.ID = PRI.TaskedID

    INNER JOIN ProjectDetail PD

    ON PD.HeaderID = P.ID

    When you have multiple joins and name the foreign key something other than the name it has in the parent table, it can get very confusing very quickly. Sometimes when writing it, definitely when bugging it. And if you didn't write it in the first place and there's no extra documentation, forget it.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Stefan Krzywicki (6/9/2010)


    Consider the difference between

    SELECT E.EmployeeID, E.LastName, E.FirstName, D.DepartmentID, D.Department,

    P.Project, P.ProjectID, PD.ProjectDetailID, PD.ProjectDetail PD.Comments

    FROM Employee E

    INNER JOIN Department D

    ON E.DepartmentID = D.DepartmentID

    INER JOIN ProjectInterim PRI

    ON PRI.EmployeeID = E.EmployeeID

    INNER JOIN Project P

    ON P.ProjectID = PRI.ProjectID

    INNER JOIN ProjectDetail PD

    ON PD.ProjectID = P.ProjectID

    [/CODE]

    AND

    [Code="sql"]

    SELECT E.ID, E.LastName, E.FirstName, D.ID, D.Department, P.Project, P.ID, PD.ID,

    PD.ProjectDetail PD.Comments

    FROM Employee E

    INNER JOIN Department D

    ON E.AssignmentID = D.ID

    INER JOIN ProjectInterim PRI

    ON PRI.ResourceID = E.ID

    INNER JOIN Project P

    ON P.ID = PRI.TaskedID

    INNER JOIN ProjectDetail PD

    ON PD.HeaderID = P.ID

    When you have multiple joins and name the foreign key something other than the name it has in the parent table, it can get very confusing very quickly. Sometimes when writing it, definitely when bugging it. And if you didn't write it in the first place and there's no extra documentation, forget it.

    IMO, the second one (even though it clearly is different than the first query) is easier to read (and write) as the comparison to a primary key stands out compared to the comparison to a foreign key. I.e., it is easier to pick out the foreign key. In addition, your choice of column names in the Select clause is unrealistic. The developer will have think about and alias the column names to prevent duplicates which is another advantage IMO.

    As I read your response, there are actually two issues here. The first, is the developer themselves writing the wrong query because they chose poor aliases. IMO, there is nothing to be done about that. Even if you used the table name in the primary key, if that column name exists in many places, it will be just as easy for that developer to use the wrong alias. In the first example, the developer could have just as easily written "PD.ProjectId = PRI.ProjectId" in the last ON clause. In my experience, choosing the wrong alias is almost never the problem with queries even on systems that are using "Id" as the surrogate key name. The problem is almost always with the choice of tables, unneeded joins, accidental cross joins and such. If I were the DBA or development manager sending the second query back to the developer because it was incorrect, I'd suggest they use better aliases and pay more attention. This is no different than a developer declaring variables "a" and "b" and using the wrong one in their code.

    The second problem is debugging someone else's code where they chose poor table aliases. For that, you need a tool like Red-Gate's SQL Refactor (No, I do not work for them). I seem to remember there is a website somewhere that will do something similar. This is basically the same problem as inheriting code that used poor variable names. The solution is to refactor the variable names into better names.

  • RichB (6/9/2010)


    Thomas-282729 (6/8/2010)


    Again, what you are suggesting is a poorly written query where the developer uses the wrong alias (A.Id instead of B.Id). How does that improve clarity in the scenario where say "EmployeeId" exists in five or ten tables? It will be just as easy to mistakenly write A.EmployeeId instead of B.EmployeeId as it would writing A.Id vs B.Id.

    Well, in this case the difference between getting a.EmployeeId instead of b.EmployeeId is likely to be less significant than getting

    a.ID when you wanted b.ID where a.ID comes from Employee a ([employee]Id, name, stuff) and b.ID comes from ProjectStaff([project]Id, employeeid, stuff)

    Also - that bug is almost impossible to find, whereas with properly named columns its pretty straightforward.

    How do you figure it is easier to find? Suppose the PK is named EmployeeId. It is just as easy to write A.EmployeeId instead of B.EmployeeId in the join as it would to write A.Id vs B.Id. In fact, I would say that having the PK be named "Id" makes it easier to see which column is the PK and which column is the FK. That would make it easier to find situations where the developer accidentally joined the column to itself (e.g. A.EmployeeId = A.EmployeeId).

    Of course, the developer used a better alias, then it would be even easier to see a problem.

  • Thomas-282729 (6/9/2010)


    RichB (6/9/2010)


    Thomas-282729 (6/8/2010)


    Again, what you are suggesting is a poorly written query where the developer uses the wrong alias (A.Id instead of B.Id). How does that improve clarity in the scenario where say "EmployeeId" exists in five or ten tables? It will be just as easy to mistakenly write A.EmployeeId instead of B.EmployeeId as it would writing A.Id vs B.Id.

    Well, in this case the difference between getting a.EmployeeId instead of b.EmployeeId is likely to be less significant than getting

    a.ID when you wanted b.ID where a.ID comes from Employee a ([employee]Id, name, stuff) and b.ID comes from ProjectStaff([project]Id, employeeid, stuff)

    Also - that bug is almost impossible to find, whereas with properly named columns its pretty straightforward.

    How do you figure it is easier to find? Suppose the PK is named EmployeeId. It is just as easy to write A.EmployeeId instead of B.EmployeeId in the join as it would to write A.Id vs B.Id. In fact, I would say that having the PK be named "Id" makes it easier to see which column is the PK and which column is the FK. That would make it easier to find situations where the developer accidentally joined the column to itself (e.g. A.EmployeeId = A.EmployeeId).

    Of course, the developer used a better alias, then it would be even easier to see a problem.

    Fortunately, in almost every company I've worked for, there are designated database developers who write the stored procedures and views. On the application and reporting side, no one is writing SQL queries, and in fact they don't even have access to the tables, only exec permission on the procedures. Therefore, arguments about what column names would save the most keystrokes to enter or what column names would be the most understandable to a casual user are not taken into consideration when designing the data model. There are cases where application developers will request that result columns from a stored procedure be aliased in a certain way to make them more understandable to them personally, and I don't have so much of a problem with that, so long as the underlying tables themselves are properly normalized and conform to a standard naming convention.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • There seems to be an assumption that there is a single field acting as the PK. This isn't necessarily true.

    As to calling things ID because its easier for developers to write boiler plate code, that sounds like a violation of software engineering principles. We want to aim for loose coupling and strong cohesion.

  • David.Poole (6/10/2010)


    There seems to be an assumption that there is a single field acting as the PK. This isn't necessarily true.

    As to calling things ID because its easier for developers to write boiler plate code, that sounds like a violation of software engineering principles. We want to aim for loose coupling and strong cohesion.

    Calling things TableNameID in one special set of circumstances because developers utilize bad development practices also sounds like a violation of software engineering principles. If it makes it easier to write boiler plate code, that is a side effect as opposed to driving reason for the naming scheme.

  • Might sound like it, but only to someone who doesn't understand the language...

Viewing 10 posts - 46 through 54 (of 54 total)

You must be logged in to reply to this topic. Login to reply