A column to identify one record?

  • Hi,

    This is an is it okay to do this type question. How bad/ok of a design decision would it be to add a bit/boolean column to a table for the sole purpose of identifying a single record, e.g. one record in a max of 1000 record table would have Colx = 1 while all others would be Colx = 0.

    My reason to identify this one record is that it would be used as a foreign key as opposed to NULL.

    My reasons for that are that

    a) I translate NULL into I do not know. In many circumstances this works well but I have realised that I also use NULL as optional foreign key value which in essence represents there is no relationship. The thing is I do know there is no relationship. My thinking was, what if I had a record in the parent table which in essence represented this, e.g. 'no relationship'.

    Of course for the circumstances where it is an optional foreign key because the parent is unknown a NULL would still be a valid value, e.g. I am not trying to avoid LEFT JOINs or anything.

    b) An advantage to this as I see it is also assuming the parent table has some sort of value which is displayed in reports, administrators of the database system would be able to decide what text would be displayed when there was 'no relationship'.

    This is of course where the bit column comes in, since the system couldn't use a primary/alternate key to identify this record, the bit column would flag the specific no relationship record. There would of course be a table level rule saying that only one record could have a true value.

    I am asking here because it does seem a bit dodgy using a column to identify a single record however for me it is solving problem, well of a sort, and it does provide functionality to users of the database.

    I am specifically not providing an example because I wanted to discuss this as a valid/invalid data design.

    Thanks in advance to any and all comments

    Steve

  • Not sure I understand the question correctly. But here goes.

    First - from a design standpoint, there is nothing wrong with a column that has the value 1 for one row and the value 0 for all others. In a Staff table, the column "is_CEO" would probably look exactly like that.

    Second - it looks like you are trying to solve a problem the wrong way. And here is why:

    SteveD SQL (2/27/2016)


    a) I translate NULL into I do not know.

    A NULL does not mean I do not know. It means no data here. Or to quote the ANSI standard for SQL, Null is "A special value that is used to indicate the absence of any data value".

    It is of course possible to invent other ways to indicate the absence of any data value in a column. I have seen lots of such schemes. Some of them avoid some of the quirkiness that is associated with NULL values. But all of them introduce far more quirkiness that is not easy handled, and that would have been avoided by just using NULLs and educating the developers on the things they need to know when dealing with NULL and three-valued logic.

    In the relational model, all we store are data values. Foreign keys are just data values that happen to uniquely identify a row in some table. So if you have a table of projects with a foreign key to the departments table, then this is implemented by adding a data value (DeptCode) to the Projects table.

    Now if a project is not assigned to any department, then it is irrelevant if that is because the project is not assigned yet, because it is assigned but nobody told you, because it is a special project that is not restricted to a single department, or because of any other reason. The only thing that matters is that the answer "what department is this project assigned to" has no answer, so there is no data value that you can store in the DeptCode column of the Projects table. And that is exactly what a NULL is invented for.

    So to wind this back to your question, I think that from a relational design point of view your idea is not strictly incorrect, but it is introducing a lot of complexity to handle something that is handled much better and easier with a NULL value in the foreign key column.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Steve,

    Like Hugo I'm not sure I understand what you are trying to accomplish. So let me provide an example of what I think you want to do. I have 2 tables, Customer(CustomerID, CustomerName, CustomerTypeID) and CustomerType(CustomerTypeID, CustomerTypeName, IsDefaultFK). For some business reason there is the possibity that the CustomerType is not provided/known so you want to have an CustomerType of "Unknown" instead of allowing NULL in the Customer.CustomerTypeID column. So you have a stored procedure something like this:

    CREATE PROCEDURE CustomerInsert

    (

    @CustomerName VARCHAR(100),

    @CustomerTypeID INT = NULL

    )

    AS

    BEGIN

    SET NOCOUNT ON;

    INSERT INTO Sales.Customer

    (

    CustomerName,

    CustomerTypeID

    )

    SELECT

    @CustomerName,

    ISNULL(@CustomerTypeID, (

    SELECT

    CustomerTypeID

    FROM

    Reference.CustomerType

    WHERE

    IsDefaultFK = 1

    ));

    END;

    If this is the type of scenario you are talking about I guess I wouldn't say it is a bad design, but I'd be more likely to force the user to pick a type in the UI or just handle the NULL issue with a LEFT JOIN to CustomerType.

    So I think I'm just agreeing with Hugo as I read my example and comments.

  • Gentlemen, thanks very much for your time and input.

    Hugo, I am glad to hear that the bit column for one record is a valid design decision. As for what NULL represents, you are of course quite right and I should not think of it that it that way. I suppose my simplistic I don't know represents that a NULL is a special value that is used to indicate the absence of any data value and I don't know why there is no dataa. As you mention here

    it is irrelevant if that is because the project is not assigned yet, because it is assigned but nobody told you, because it is a special project that is not restricted to a single department, or because of any other reason.

    Going with the Dept/Projects example, I think the fact that Dept FK is NULL doesn't provide me or more importantly users of the data with enough information to make an informed decision , e.g. Does this need to be investigated?

    My question was working on the premise that the projects is a...

    project that is not restricted to a single department

    and therefore the purpose of the 'No Department' record in the department table would be to a) provide users with the information that there was a business decision made that there is no department assigned to this project and therefore nothing has to be done b) provide system admins with a way to control the values displayed in forms and reports when no department is assigned to the project. Also if there is a description/comments/notes field in the department table this could be documented in the data itself.

    While the above does work for me, does it work for others or is this just bad design?

    Thanks again for your time

  • SteveD SQL (2/27/2016)


    Hugo, I am glad to hear that the bit column for one record is a valid design decision.

    Let me correct that: it may be a correct design decision. It doesn't violate any relational rule. That does not guarantee that it is a valid decision in your case.

    As for what NULL represents, you are of course quite right and I should not think of it that it that way. I suppose my simplistic I don't know represents that a NULL is a special value that is used to indicate the absence of any data value and I don't know why there is no dataa.

    And another clarification of my former post: you may or may not know why there is no data.

    In my Staff table, when the ContractEnd data is NULL, I know perfectly well why there is no data, because that only ever happens when the employee is still contracted. So in this specific case, I know that a missing ContractEnd value is always caused by that attribute being not applicable.

    On the other hand, in the Customers table, a NULL BirthDate can happen for various reasons. Perhaps we forgot to ask when registering the customer ("unknown"). Or perhaps the customer is a legal entity and not a natural person ("not applicable"). Or the customer is sensitive about her (usually) age so refused to give that information ("no access"). And there might be other reasons too. But apparently the company I work for does not care to know why a birthdate is missing, or otherwise they would have added a column to store the reason why a birthdate is missing, which they didn't.

    Going with the Dept/Projects example, I think the fact that Dept FK is NULL doesn't provide me or more importantly users of the data with enough information to make an informed decision , e.g. Does this need to be investigated?

    And here is where my birthdate seems to differ from your department. Apparently, a missing department is just find in some cases, but can be a red flag in other cases. And you need to quickly find the projects with a "red-flag" missing department without being distracted by the projects with a "just-fine" missing department.

    In such a case, you would definitely need to store an additional column. Whether this should be a simple "RedFlag" column with data type bit depends on the context. Perhaps a better idea would be to have a column "ProjectType" for the various types of projects (giving more information than just the bit), and you could then select rows based on projecttypes that should have a department but don't. Or you could even add a CHECK constraint to prevent this from happening in the first place.

    I realize I am stretching the project/department example that I introduced myself, but I like to explain stuff by using concrete examples instead of vague generalizations. I hope you can translate this back to your system. And I hope that you understand that I am trying to tell you to remove your technical solution-oriented glasses for a while and look at the actual problem in business terms. What is the relevant information you need to store about projects? How would an average end user of your database application talk about the projects without departments that do or do not need to be investigated? Stick to their terminology and you will end up with a database design that is far more useful for that business, because future DB developers who understand the business wiill understand the data model - and DB developers who do not understand the business but who do work there will always have access to people who can explain what's happening. If you add a column "RedFlag" today and walk under a bus tomorrow, nobody will understand what this column represents. But if you name it "ProjectType", your successors can ask anyone in the business what project types they have, and they'll understand the column,

    Sorry for going off on a tangent, you might not notice 😀 but this subject is dear to me.

    Also, while I did not literally answer all your questions, I hope I have given you food for thought that enables you to answer them yourself.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Basically I agree with Hugo, but I'd express it mperhaps more simply (and perhaps more bluntly).

    Although using a column to identify a single row is certainly a valid thing within the relational model, it shouldn't be done in such a way that it makes you lose sight of what sort of entity is stored in your table. If your project points to its owning department, the things in the table pointed to are departments, not something else. But a row in that table that is identified by your extra column doesn't represent a department, it represents a state of not being a department. That, in my opinion, is a hopless violation of the relational model.

    Tou could of course interpose an extra intermediate table ("project details table", perhaps) in which one column gives the owning department and another provides something about whether there should be an owning deprtment (eg owned by single department, project complete so longer owned, project not yet assigned to a department, project abandoned and no longer owned, multiple-department project, and so on - and multiple-departments maybe could require that the relationship is correctly represented as many to many instead of many to one so the project table is referenced by the intermediate table instead of vice versa and allowing multiple-departments in that column is an improper reduncancy that can lead to undesirable anomalies) but in my view that's pretty clumsy. It would be better to have an extra column in the project table indicating the various possibilities (owned, complete, abandoned, not yet assigned, and so on) and have a check constraint that doesn't permit NULL department reference where that's inappropriate (and if it's to be represented as a many to many relationship there is still an intermediate table which does all the referencing). I gather from your posts so far that you don't wnt to represent this relationship as many to many, so an extra column in the project table is the right way to go.

    Tom

  • Thanks Hugo and Tom for your very detailed and informative responses. I think where I am going wrong in my thinking is in trying to deal with a front end issue, how to describe something, in the back end.

    Just to bring it all into perspective, my current situation is a structure where I have Department, Team and Role. This is not a structure for a specific company but will be part of a web application that could be used by any number of companies/clients, well one hopes. 🙂

    While there is a definite hierarchy, Departments - > Teams -> Roles, the relationships are of a 0-M nature. It is up to the client on how they want to set this up. It should be noted that these entities are not key to the database but at least one role is required. That being said a Role could also be a member of one or more teams and/or departments. They could be a member of a department directly, e.g. no team, or indirectly via the role's team department. The team could belong to one or more departments or none. Basically the more a clients adds to the system the more functionality they get out however I don't want to force them to compile and add data if they can't be bothered.

    So what I currently have is those three entities along with a DeptTeamRoles linking table.

    The reason I started asking about this 'no record' record was I started creating data entry forms and reports for the Organisation Hierarchy. I found myself adding values into pick list to be selected by the user and/or display in reports like (No Department) or (No Team). The problem I found was I was adding this in multiple places and I hate having to repeat values. I also like to provide the users with some control over what is displayed.

    I had been thinking about a 'NULL' replacement table to hold these value but I do like to add some Admin type reports in my systems to look for potential data issues. Therefore identifying where a department and/or team is NULL by design as opposed to error would be nice.

    Anyway, as was rightly pointed out my solution was not valid. I will have a think on the idea of 'Type' in the linking table which would say, 'this is ok' along with the NULL replacement table.

    Thanks again for your time

    Steve

  • SteveD SQL (2/29/2016)


    So what I currently have is those three entities along with a DeptTeamRoles linking table.

    And there's the root cause of your problems. This table design is a violation of 5th Normal Form, and your modifications and changes are attempts to work around the problems caused by the incorrect table design.

    Your text explains that a role can be a member of one or more teams, and a role can be a member of one or more departments, and that a team can be a member of one or more departments. That sounds like three separate facts to me, which leads to three separate relationships between the three entities. If all those relationships are many-to-many relationships, then you will end up with three relationship-tables. If one or more of them is one-to-many, the foreign key can collapse into the entity table.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Thanks Hugo,

    Looks like a trip back to the drawing board

    Take Care,

    Steve

  • Hi Hugo,

    Sorry to bother you, hopefully one last time. I have attached simplified erd, Is this the structure you were eluding too?

    Recapping:

    - Departments can have member teams

    - Departments can have member roles

    - Teams can be member of departments

    - Teams can have member roles

    - Roles can be member of teams

    - Roles can be member of departments either directly, no team, or implied via a team

    Thanks Again

    Steve

  • At first sight, the data model looks okay. Of course I do not know the exact requirements. Also, every flavor of ER modeling has its own ways to represent optional/mandatory and one/many in relationship ends, so I didn't try to read anything in the circles and dashed lines. (I think the craw's feet meaning is quite standard in the representations that use them, though)

    Based on your written explanation in your post, all relationships are optional. So you can have teams without member roles that are not part of a department, departments with no teams and no member roles, and member roles that are not part of either a team or a department. If any of these statements is incorrect, then you may have to make some of the relationships mandatory instead of optional. (Or perhaps you need some special constraints, for instance if a department must have at least one member role or one team - in that case each relationship in itself is optional, but the combination is mandatory).

    (Note that most database engines cannot enforce a mandatory relation if it's many to many, unless it's an RDBMS that supports deferred constraint checking - which SQL Server does not, unfortunately)

    I think that the choice to make all three relationships optional is okay. But do check. Can a department have multiple teams and/or member roles? Can a team be part of multiple departments and have multiple roles? Can a member role belong to multiple teams and/or departments?

    You might also want to checkk for mutually exclusiveness between the relations - i.e. if a department has a team, can it also have direct member roles, or is is either or? Same again for all other entities and relationships.

    In a pure relational design, you should not store the implied department-role membership, since this is redundant data that can always be inferred from the dept-team and team-role relations that are stored. However, if this poses a performance problem you may need to denormalize here. If you do, then my first choice in SQL Server would be an indexed view; failing that I would either add a fourth relationship table for only the implied relationships or add them to the existing table with an additional attribute to distinguish the implied from the direct connections.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply