Database Design Follies: NULL vs. NOT NULL

  • kenambrose (3/10/2016)


    why do the same flawed arguments keep popping up?

    [[If you deal with a report and want all columns via left joins, then there will be nulls. ]]

    We don't know the difference yet between a report and a data relation/table?

    Seems like we have both made comments to this regard and that we are in agreement. Using tokens in the same column as the data to avoid nulls in a report is different - it's worse.

    Maybe it's time for me to rethink. Would it really be that much worse to deal with 10000 tables rather than 1000 tables in some databases? Might be a theory and practice sort of thing. Can my logical table be a container or view for the equivalent set of non-null logical tables?

    Are you a fan of NoSQL by chance? 🙂

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • BTW, solving 1X = 2X has X = 0 as a solution. There is no contradiction as it's also a solution of the original degenerate equation which is 0=0 that does not say anything about X.

    How abut solving for X when there is no equation for X. X is then unknown.

    X + 1 > 0 is unknown

    X * X + 1 > 0 is true

    The null logic does work in T-SQL if you understand it.

    Qty: -1 (value or token?)

    Price: 10

    Total: -10 (refund or unknown?)

    Qty: NULL (unknown)

    Price: 10

    Total: NULL (unknown)

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • that question implies that you would otherwise have ~10000 "nullable" columns in your design?

    which would be nonsense design under any stretch of the imagination, wouldn't it?

  • kenambrose (3/10/2016)


    that question implies that you would otherwise have ~10000 "nullable" columns in your design?

    which would be nonsense design under any stretch of the imagination, wouldn't it?

    If a solution doesn't scale properly it's not a good solution.

  • More nonsense....

    I give up. 🙂

  • ZZartin (3/10/2016)


    kenambrose (3/10/2016)


    that question implies that you would otherwise have ~10000 "nullable" columns in your design?

    which would be nonsense design under any stretch of the imagination, wouldn't it?

    If a solution doesn't scale properly it's not a good solution.

    I have had my imagination stretched. However, in this case, it is my imagination. The database I have in mind only has only about 500 tables with almost 10000 columns. They don't allow nulls for the most part because of the nature of the conversion from the mainframe database. Despite the design, it works. I was just thinking of what that database might look like if nulls were allowed and a non-null pattern was suddenly applied to it. Rube Goldberg would be proud.

    BTW, the conversion was required because the billion dollar project that was supposed to provide a replacement failed. Yes, with a B. (I would have done it for half that!)

    I heard a quote about change management that I liked and will alter for this context. Good database design is like the Loch Nest monster, often heard about but seldom seen.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • Converting from a mainframe is always going to be painful 😛

    I was more responding to how to design a freh DB.

  • rstone (3/10/2016)


    In the logical design, the business logic might require handling missing data and reasons for missing data. Using tokens -2, -1 , 0, etc. is one way of handling nLvl logic via denormalization. (The value column and the reason columns have been combined into one "value" column because either the value or the reason is defined, not both, in any given record. If either a phone number or SSN is defined, but not both, then use one column for both? Okay in some cases, but not others? Using a null is worse?) The physical infrastructure provides one option of tri-level logic via nulls. You can choose to use the null for missing and add logic for the remainder of the nLvL logic, or handle it all without nulls. Adding perplexity to complexity can simplify a single query. (If the business requirements require this, then suck it up.)

    The argument that nulls are too complicated does not fly with me. The null logic is well defined and deterministic. A new developer is likely to make all kinds of mistakes. We don't disable other features (e.g., cursors, temp/local tables, scalar UDFs, NOLOCK, triggers, etc.) because of this.

    BTW, with the non-null design, either the "value" column table or the "reason" column table will have a record, not both. If you want either records with values or with reasons, but not both, then there are no nulls. If you deal with a report and want all columns via left joins, then there will be nulls. You can choose how to apply the business logic to the value or reason - even translating the value and reason into one column without nulls. If you use tokens instead of nulls, then you have hard coded business logic into the data. Applying the business logic will require knowing how it was already applied in those cases where the logic already applied no longer exactly applies.

    I don't think I agree that this is a reason/value argument. In this case the "reason" *is* a value, it's part of the domain. What you do with it is clearly business-rule related, but the physical structure for dealing with it is uniform--unlike null processing.

    In effect, UNK/TBD/n/a become part of every domain that has a relation (assuming the domain has room).

    TBD is a perfectly valid entry for an address, for example. It allows the user to save the record *but not use it* until the actual address is known. It allows creation of a "health meter" to validate records and pinpoint data that needs attention. And, most importantly, it tells you why, which Null by itself can't.

    Thus the Phone vs SSN argument is negated, because a phone number and an SSN are different domains, but an address that is to be determined is still an address.

    I don't know if I'm arguing nulls are too complex--rather I think I'm arguing nulls are needlessly complex, and except for one limited case are always better handled with flag values/tokens. For one thing you never have null outer joins. 😀

  • The problem with place holders like that is that you are using the same field for multiple purposes. TBD for example is not an addess it's a process control value that tells someone what the status of that record is and how to handle it. How would you handle it if you have an address but someone needs to mark it as invalid or needs verification? Using a place holder value in the addess field would be inappropriate because you would lose the current value.

    And trying to parse repurposed fields will easily become far more tedious than putting a few null checks in the right places.

  • ZZartin (3/11/2016)


    The problem with place holders like that is that you are using the same field for multiple purposes. TBD for example is not an addess it's a process control value that tells someone what the status of that record is and how to handle it. How would you handle it if you have an address but someone needs to mark it as invalid or needs verification? Using a place holder value in the addess field would be inappropriate because you would lose the current value.

    And trying to parse repurposed fields will easily become far more tedious than putting a few null checks in the right places.

    But in your example putting null in the field would also lose the current value.

  • BTW, solving 1X = 2X has X = 0 as a solution. There is no contradiction as it's also a solution of the original degenerate equation which is 0=0 that does not say anything about X.

    How abut solving for X when there is no equation for X. X is then unknown.

    X + 1 > 0 is unknown

    X * X + 1 > 0 is true

    The null logic does work in T-SQL if you understand it.

    Qty: -1 (value or token?)

    Price: 10

    Total: -10 (refund or unknown?)

    Qty: NULL (unknown)

    Price: 10

    Total: NULL (unknown)

    I was not trying to solve any equation. If that was the goal, then solution of X^2-X^2 = X^2 - X^2 is every X, any number, real or imaginary, because given expression is an identity. Solution "every X" is very different from X=0, by the way. I tried to show how ignoring rules of logic leads to wrong conclusions, such as 1=2, or if you insist, that the same equation of 1 variable has two solutions X=0 and "every X that exists".

    An I never said that tokens are any better than NULLs. My point is that it is not our choice whether to use NULLs or not, it is illogical thing to do - it simply does not make sense. If something does not make sense, we cannot vote sense into it.

    Examples people provide to defend use of NULLs are also wrong. Your example is talking about two very different predicates. One is "Item [Item] is in stock , quantity [Qty] , priced [Price], which gives total value of [Total=[Qty]*[Price]". When you say Qty = -1, Price = 10, Total=-10 you are saying "Item ItemID = 1 is in stock in quantity Qty = -1, prced Price = 10, which gives us total value of Total = -1*10=-10" This makes sense only if you allow Qty to be negative which makes no sense, either. Negative quantity means perhaps 'we own 1 piece of item ItemID', very different from 'Item ItemID is in stck, quantiyu..." One predicat says "I posses something", the other one says "I owe something" Two predicates, two different meanings, and we try to put them together, into the same relation, that is not very logical to me. Ditto for NULLs instead tokens.

  • roger.plowman (3/11/2016)


    rstone (3/10/2016)


    In the logical design, the business logic might require handling missing data and reasons for missing data. Using tokens -2, -1 , 0, etc. is one way of handling nLvl logic via denormalization. (The value column and the reason columns have been combined into one "value" column because either the value or the reason is defined, not both, in any given record. If either a phone number or SSN is defined, but not both, then use one column for both? Okay in some cases, but not others? Using a null is worse?) The physical infrastructure provides one option of tri-level logic via nulls. You can choose to use the null for missing and add logic for the remainder of the nLvL logic, or handle it all without nulls. Adding perplexity to complexity can simplify a single query. (If the business requirements require this, then suck it up.)

    The argument that nulls are too complicated does not fly with me. The null logic is well defined and deterministic. A new developer is likely to make all kinds of mistakes. We don't disable other features (e.g., cursors, temp/local tables, scalar UDFs, NOLOCK, triggers, etc.) because of this.

    BTW, with the non-null design, either the "value" column table or the "reason" column table will have a record, not both. If you want either records with values or with reasons, but not both, then there are no nulls. If you deal with a report and want all columns via left joins, then there will be nulls. You can choose how to apply the business logic to the value or reason - even translating the value and reason into one column without nulls. If you use tokens instead of nulls, then you have hard coded business logic into the data. Applying the business logic will require knowing how it was already applied in those cases where the logic already applied no longer exactly applies.

    I don't think I agree that this is a reason/value argument. In this case the "reason" *is* a value, it's part of the domain. What you do with it is clearly business-rule related, but the physical structure for dealing with it is uniform--unlike null processing.

    In effect, UNK/TBD/n/a become part of every domain that has a relation (assuming the domain has room).

    TBD is a perfectly valid entry for an address, for example. It allows the user to save the record *but not use it* until the actual address is known. It allows creation of a "health meter" to validate records and pinpoint data that needs attention. And, most importantly, it tells you why, which Null by itself can't.

    Thus the Phone vs SSN argument is negated, because a phone number and an SSN are different domains, but an address that is to be determined is still an address.

    I don't know if I'm arguing nulls are too complex--rather I think I'm arguing nulls are needlessly complex, and except for one limited case are always better handled with flag values/tokens. For one thing you never have null outer joins. 😀

    Defining a domain is one of those things I'm a bit unclear about. I suppose if the business logic was such that the item is not parsed and always used the same way, then it can be of the type "MyThing" that contains XML with a book, contract for a book, or the date the contract is due, etc. It's okay because the logic does not depend upon the content? (Is there an internal vs external logic consideration? For example, if a join has to differentiate between reasons and non-reasons, then that's bad? If the only place for that logic is in a select clause, then that's okay?)

    It bothers me to have a domain with values that are defined to be the same if they are not treated the same - a value or a reason the value is not present. Is it only okay because the business logic is such that the "value" is treated the same way.

    On the other hand, if a standard is defined around the use of tokens in a clear and consistent way, then is it really different? The logical design with a lookup table and reason column implemented physically without a lookup table, but both following the business requirements. (Best of both worlds: the purist can have the perfect logical design, but be horrified with the physical implementation?)

    Is the use of a token a denormalization method?

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • Zidar (3/11/2016)


    BTW, solving 1X = 2X has X = 0 as a solution. There is no contradiction as it's also a solution of the original degenerate equation which is 0=0 that does not say anything about X.

    How abut solving for X when there is no equation for X. X is then unknown.

    X + 1 > 0 is unknown

    X * X + 1 > 0 is true

    The null logic does work in T-SQL if you understand it.

    Qty: -1 (value or token?)

    Price: 10

    Total: -10 (refund or unknown?)

    Qty: NULL (unknown)

    Price: 10

    Total: NULL (unknown)

    I was not trying to solve any equation. If that was the goal, then solution of X^2-X^2 = X^2 - X^2 is every X, any number, real or imaginary, because given expression is an identity. Solution "every X" is very different from X=0, by the way. I tried to show how ignoring rules of logic leads to wrong conclusions, such as 1=2, or if you insist, that the same equation of 1 variable has two solutions X=0 and "every X that exists".

    An I never said that tokens are any better than NULLs. My point is that it is not our choice whether to use NULLs or not, it is illogical thing to do - it simply does not make sense. If something does not make sense, we cannot vote sense into it.

    Examples people provide to defend use of NULLs are also wrong. Your example is talking about two very different predicates. One is "Item [Item] is in stock , quantity [Qty] , priced [Price], which gives total value of [Total=[Qty]*[Price]". When you say Qty = -1, Price = 10, Total=-10 you are saying "Item ItemID = 1 is in stock in quantity Qty = -1, prced Price = 10, which gives us total value of Total = -1*10=-10" This makes sense only if you allow Qty to be negative which makes no sense, either. Negative quantity means perhaps 'we own 1 piece of item ItemID', very different from 'Item ItemID is in stck, quantiyu..." One predicat says "I posses something", the other one says "I owe something" Two predicates, two different meanings, and we try to put them together, into the same relation, that is not very logical to me. Ditto for NULLs instead tokens.

    I have had some experience making sense of bad designs. The -1 might have been coded to mean a return and the negative total a return refund. (100s of databases, many developers, and 40 years of time can result in a lot of variation.) If it was coded to be a return, a purchase, or unknown qty via a token.... Anyway, it was not supposed to make sense. However, in the second example, it is clear to me that the qty is unknown. And the consistent logic produced an unknown total. I'm not saying that it should be done this way.

    PS The domain for Qty might be defined to be the change to inventory, not just how many items are purchase. This domain would allow for both purchases and returns. Bad domain?

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • Stacey Decker (10/30/2014)


    I agree, why allow two values to mean the same thing? I handle this by allowing optional fields such as AddressLine2 to be NULL and then implement a constraint to prevent an empty string (AddressLine2 <> ''). Thoughts on this approach?

    Interesting that you regard NULL to be a value. That aside, for strings, I prefer the simpler approach: use empty string to represent your "special meaning" and a NULL constraint to prevent a second mechanism for the same thing.

  • Dennis Q Miller (6/1/2016)


    Stacey Decker (10/30/2014)


    I agree, why allow two values to mean the same thing? I handle this by allowing optional fields such as AddressLine2 to be NULL and then implement a constraint to prevent an empty string (AddressLine2 <> ''). Thoughts on this approach?

    Interesting that you regard NULL to be a value. That aside, for strings, I prefer the simpler approach: use empty string to represent your "special meaning" and a NULL constraint to prevent a second mechanism for the same thing.

    I disagree with the use of any "magic" values that don't mean what they actually are, such as '' to mean something else or a date of 20990101, etc.. If NULL alone can't fully differentiate the values for you, then add an indicator column (a tinyint) to explicitly identify what the special value is. For example, 1=missing; 2=n/a|doesn't apply to this row; 3=don't know; 4=know a value exists but don't know what the value is at this time; etc..

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 121 through 135 (of 137 total)

You must be logged in to reply to this topic. Login to reply