Keeping up to, and down with, Date

  • TomThomson - Friday, April 28, 2017 5:28 PM

    Eric M Russell - Monday, January 9, 2017 3:33 PM

    Phil,The name of the this article "Keeping up to, and down with, Date"; I get that it's a pun making reference to the mathematician, relational database pioneer, and author Christopher J. Date, and also perhaps an even more oblique reference to the play, "Keeping Down with the Joneses". If so, then that's pretty clever. 🙂

    I wondered if Date was intended to suggest CJD, but then thought that Phil would never want to get down to that level let alone keep down there.  I may of course be wriong.   But after suffering all his nonsense, things like

    "NULL - you can't have NULL, because as soon as you need one sort of NULL you need a second sort and one you have two sorts of NULL it's easy to prove you need an infinite number of sorts"
    Which was pure bullshit

    "there's no need for NULL, you can always do the same thing by having a default value for the column" 
    That's going to work beautifully for the inner join of two derived tables each constructed by outer joins between base tables, isn't it? maybe you can get around it if you can find a value you can safely exclude from the original domain (sometimes difficult), and ensure that it doesn't combine in any sort of sane manner with other values in the domain (usually impossible) or values in other domains which can be combined with operators (almost always impossible even if the usually-impossible in domain requirement isn't), and are prepared to write extra SQL to exclude cases of a match on the default and replace all the arithmetic and comparison operators with code to handle these defaults.  So that was also quite clearly pure bullshit

    "1 NF is a bad idea because because the intersection of a row and a column shouldn't be restricted to a things that the database sees as single value, it could be something seen as a list of values or as a set of values or as whatever you like"
    I guess I don't have to say it a third time?

    and seeing the wonderful idiot he made of himself in the published debate with Codd,
    I don't think I'll ever trust anything he said or wrote, and keeping down with CJD would in my view be insane.

    Yes, he wrote some quite intelligent stuff  as well, but most of it was tainted by the sort of nonsense I've referred to.  The worst of it is that people who don't understand the relational model have taken his rubbish to heart and we have extreme anti-null people and anti-atomicity people participating in this site.  Sensible people say "try to avoid NULL but if you can't don't screw up by not using it", which is very anti-CJD and very pro-Codd; and sensible people also say "we don't care if the application which gets the value from the database sees what it's got as a complex object with many components as long as the database sees it as a simple atomic object that has no structure but only attributes discoverable by the functions built in to the SQL (or QEL,or whatever our database uses for queries) language" which is very anti-Date, and very pro-Codd (who would of course have voted for "whatever" rather than for SQL).

    It's strange.  I never read Date's stuff but had heard a whole lot of people talk about his seemingly anti-Codd, anti-SQL tendencies.,  To be honest, I sucked in what other people said and thought rather poorly of Date because of it and, especially, because on his reported take on NULLs.  For example, the following, which you quoted, is one of those Date quotes that made me think that Date was drinking bong water through two straws.  I assume the last sentence is yours.

    "NULL - you can't have NULL, because as soon as you need one sort of NULL you need a second sort and one you have two sorts of NULL it's easy to prove you need an infinite number of sorts"
    Which was pure bullshit

    10 years ago, I would have shouted "I AGREE!  TOTAL BS! THERE IS NO NEED FOR MORE THAN ONE TYPE OF NULL AND YOU DO NEED NULL!"  In fact, when I first read the quote a couple of hours ago, I was on the verge of hitting "Post Reply" with a similar shout... but then, remembering that, like anyone else, I've been seriously burned by responding before finishing the read, I read further down to the following...

    "there's no need for NULL, you can always do the same thing by having a default value for the column" 
    That's going to work beautifully for the inner join of two derived tables each constructed by outer joins between base tables, isn't it? maybe you can get around it if you can find a value you can safely exclude from the original domain (sometimes difficult), and ensure that it doesn't combine in any sort of sane manner with other values in the domain (usually impossible) or values in other domains which can be combined with operators (almost always impossible even if the usually-impossible in domain requirement isn't), and are prepared to write extra SQL to exclude cases of a match on the default and replace all the arithmetic and comparison operators with code to handle these defaults.  So that was also quite clearly pure bullshit

    ... and, again, I was on the verge of a too-early submittal of a "PREACH IT BROTHER!  TOTAL HOOIE!" type of post but resisted because of some of the things you said (and I assume the words after the embedded quote are yours) above.

    My mind raced back to the dozen's, if not hundreds, of posts I've seen and rather heated threads I've sometimes even participated in.  I remember quite clearly others stating that there needs to be more than one type of NULL and me thinking "NULL MEANS UNKNOWN!  THERE"S NO NEED FOR MORE THAN ONE TYPE OF NULL"!  I've even been known to join the band-wagon of people who's mantra it is that "NULL IS NOT NOTHING"!

    Reading your post over and over and wondering why I still couldn't hit "Post Reply" with resounding agreement, I was like the Grinch pondering why the "Whos" were still celebrating and singing of Christmas after their houses had been denuded of all vestiges of Christmas.  And then PFFFT!  GAK!  (I'm not sure it wasn't actually an aneurism), I finally realized what Date is talking about because not only have I created multiple definitions of "NULL" (as in "UNKNOWN" and pretty much as you posted), I've also used it to mean "NOTHING".  And, contrary to what was stated, it cost me nothing in extra, repetitive SQL to exclude my definitions of "NULL" because I did it all in DDL in the form of CHECKs or DRI and would have had similar checks.to limit the domain of possible values anyway.  In fact, I frequently do such things to help me keep from writing extra SQL.

    One good example can be found in the form of an "EndDate" column (or whatever you prefer to call it, you get the idea).  I know lot's of people that use a NULL as the EndDate if no EndDate is known.  In other words, they use NULL appropriately because the EndDate is, in fact, UNKNOWN.  And, yet, I'm one of those that preach to my charges that you should NEVER use a NULL EndDate because it requires an OR in the criteria when trying to find all active rows.  For example, if you need to find all of the active rows (either a NULL EndDate or a FUTURE EndDate) and you've used NULL as an UNKNOWN EndDate, then your criteria would necessarily look like....

      WHERE StartDate <= GETDATE()
        AND (EndDate >= GETDATE() OR EndDate IS NULL)
    ;

    ... and we all know the performance "joy" that extra bit of SQL in the form of an "OR" brings.  Instead, I tell them to use 9999-01-01 because 1) it's easily assigned to a DATETIME column just by assigning "9999" which 2) makes it easy to remember as a standard, 3) allows computational headroom for "EndDate + 1" (use DATEADD if you insist) types of calculations, and allows us to "avoid the OR" because the previous criteria now works just as...


      WHERE StartDate <= GETDATE()
        AND EndDate >= GETDATE()
    ;

    In other words, I have defined an UNKNOWN (which is loosely described by NULL) as a type of "NULL" in the form of a far reaching date.  I do it all the time and I'm pretty sure I'm not the only one.

    Another example is that of an Address2 column in a table.  Many use (inappropriately, I may add) NULL if there is no Address2 in an address where I enforce a NOT NULL and use a blank if there isn't one because it's actually known that there is no entry for those not having a second line to their address.  To use NULL, in such a case, would be to violate the idea of the UNKNOWN not being NOTHING.

    Heh... and then there are "status" columns.  Just like EndDate, I require a NOT NULL/NOT BLANK in such columns.  If the status of the row is UNKNOWN, then a "U" is used in the column and is enforced not only by the NOT NULL constraint but also by the DRI of an FK pointing to a reference table.  I have yet again defined a different type of UNKNOWN and, therefor, a different type of "NULL"... just like DATE said.

    And, finally, I rely on the fact that, despite my support for the idea that "NULL IS NOT NOTHING", I frequently use it as "NOTHING" in formulas.  For example, the old method over-loading a variable to perform concatenation.  The variable starts out as NULL so that I can use ISNULL to avoid the inclusion of a leading comma without have to STUFF it out of existence.  I'm using the NULL to indicate that NOTHING has been done in the concatenation yet.

    While I don't agree that NULL is totally unnecessary nor do I agree that a default can be used in it's place in all instances (although your post and my recent apparent aneurism 😉 give me cause to re-examine a lot of that), I no longer think that Date is full of BS on the subject.  I think he may have been misinterpreted.

    On the subject of ...

    "1 NF is a bad idea because because the intersection of a row and a column shouldn't be restricted to a things that the database sees as single value, it could be something seen as a list of values or as a set of values or as whatever you like"
    I guess I don't have to say it a third time?

    ... anyone who has allowed any XML, JSON, or CSV to be stored in their database agrees that "a column shouldn't be restricted to a [sic] things that the database sees as a single value" because it does, in fact, contain more than one value.  People do it all of the time.  I do disagree with his definition a bit though.  The database DOES see it as a "single value".  The "single value" is the "set" of values about some thing.  Although I disagree with the idea of storing XML that needs to be shredded by the database whenever it is used, it does work out quite handily for passing settings to a GUI (or even a stored procedure although that does require shredding in the database and so I'm loath to allow it) about (for example) a client's configuration preferences.  It can be passed as a "single value" to the GUI and then the GUI has it's way with it.  The database doesn't care... it sees it as a "single value".  If you don't think so, then consider what happens when you store a JPG or other image... in the case of such bit images, you're actually storing the set of pixel colors and intensities that it takes to render the picture.  A "Set" of items stored as a "single value".

    I also did a little reading about Date before posting this.  He's definitely not anti-Codd.  In fact, he busted a hump to promote Edgar Codd and his ideas and an interview with Codd's wife do seem to confirm that.  And, he's not anti-SQL.  He's anti-non-relational and both he and Codd were trying to sell the idea of a true relational database to IBM and compromised a bit just to get IBM to move off top-dead-center on the idea  Apparently, there are a lot of reasons why SQL (Oracle, SQL Server, MySQL... pick your flavor) doesn't actually qualify as a truly relational database language and the underlying DBMSs aren't actually truly relational in nature, either..

    So, I'm not quite as joyful as when the Grinch found the true meaning of Christmas and with the caveat that I've still not read his works, I don't currently share the same animosity with the me from a couple of hours ago about Date and I actually have you to thank for that because your examples in your post caused me to think about things (especially about the possible definition of multiple "types of NULLs") a whole lot more than I have in the past.  I can't speak for anyone else but I think what Date has stated and been called BS for may just be a misunderstanding.  Of course, I could be wrong in that opinion but I'm no longer taking other's words for it.  I'm going to dig in on some of the things like the debate between he and Codd and some of his books and (much more carefully) re-form my own opinion. 😉  I may still come to the same conclusion but I've already gotten a lot out of all of this and I'm looking for more.

    Heh... and all of that is my very strange way of saying "Thanks Tom".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Factor - Sunday, January 8, 2017 1:09 PM

    @andrew..PetersonThere was a book that Microsoft gave away with SQL Server in the early days that was wonderful. It used _PUBS database for all the examples and was very clear. I think it must have come from Sybase originally. I loved it but copies were hard to come by and eventually someone took it. I'd love a copy.@kfries.I have a copy of Ken England's SQL Server 2000 performance Optimization and Tuning Handbook It is clear and sensible, and gets to the point. Sure, quite a bit has changed, but the general principles are the same. I still refer to it.

    Phil, 
    Would that book happen to have been in the boxed set for the SQL Server 4.21 release?

    The more you are prepared, the less you need it.

  • There are several good books mentioned here, so I'll jump forward in time a little.

    I know it's very outside what I do now, but Hardcore Visual Basic by Bruce McKinney taught me to never settle for the limitations imposed by a language.  If the language doesn't do what you need, figure it out, write it and use it.

    I also enjoyed the SQL Server Deep Dive books, SQL Server 2012 Internals by Kalen Delaney and Window Functions by Itzik Ben-Gan.

  • Jeff, I wouldn't for a moment claim that having a default instead of using NULL never works,  just that there some are cases where it doesn't work.  Things like using a zero-length string instead of NULL or a maximum possible date or maximum possible integer or -1 for a positive integer insted of NULL will work in a lot of scenarios, and should be used insted of NULL whenever they will work.  But there also a lot of scenatios where they won't work, and using NUKLLwill reduce code complexity (or avoid bugs caused by the complex code not getting written when it's needed).  The "you must never have a NULL" argument falls down all over the place because there are lots of cases where one can't find  value that will not crop up as a real value- what default can I sensibly use instead of NULL if have a bit column (actually I'd probably use a tinyint with a <4 check constraint and default 3, unless there was some pressing reason not to, but things get nasty if I actually need all 65536 smalint values)?  How are we going to handle the no match cases for outer joins if NULL is not available?  If you look at the Third Manifesto (latest revision I read was dated 2014 - available at http://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf) Date and Darwen are still absolutely banning NULLs but they seem to have gone back to Codd's definition of atomic (for all I know they were already back to it in 2007, when they first version of their book was published).  They still say that any proper relational language has to include declarations of tuple-variables (that makes sense, I think), but must contain no tuple-level operations. 
    But the things that put me off CJD were various papers in the 70s and 80s, and the way after he talked Codd into wanting two distinct NULLs he went totally anti-NULL.

    Tom

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply