The Scientific Method: a call to action

  • Wow, Patrick.  You totally missed the point.  Ron stated that Gail said something that she didn't actually say and that was the point of contention.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Matt Miller (4) wrote:

    Is it just me or can you no longer get to the original article?  All links to it point to the DISCUSSION, not to the ARTICLE.  Sadly the article was much more meaningful than the sniping that went on a few years back.

    Of course I could have missed how to actually get back to the article itself.

    It's at https://www.sqlservercentral.com/editorials/the-scientific-method-a-call-to-action.

    Tom

  • Jeff Moden wrote:

    Wow, Patrick.  You totally missed the point.  Ron stated that Gail said something that she didn't actually say and that was the point of contention.

    Hey man thanks, for a minute there I was afraid nobody would explain the thread to me

     

  • RonKyle wrote:

    Jeff Moden wrote:

    Why the hell not?  It IS STRONGLY related and she did NOT say to avoid the use of natural keys but to take them into consideration.  Based on your original misread of her conclusion, your suggestion seems to be a bit on the side of sour grapes.

    Not at all.  I'm using language. 

    Yes, you are indeed using language. After all, that's the only way available to you in a written conversation to express those sour grapes that Jeff so accurately mentioned.

    • This reply was modified 4 years, 7 months ago by  TomThomson.

    Tom

  • There are no sour grapes at all.  You all really need to learn that you should attack ideas and not people.  However, in the interest of clarification: Gail, you hypothesized “You should use integers for the key rather than strings. The joins will be faster.”

    You then demonstrate that they are indeed faster, something of which I had no doubt.  Then you state:

    While this may not the final nail in the coffin for natural keys, it is worth keeping in mind when choosing between natural and artificial keys for a system, especially one likely to process large numbers of rows, such as a decision support/datawarehouse system.

    Yet you said that you had no comment on natural keys.  You also didn't distinguish between an OLTP design and an OLAP design.  The second one absolutely should use surrogate keys in all cases for exactly the reasons you point out.  But it doesn't follow that the same conclusion for an OLTP design.  All I pointed out is that I thought your conclusion went beyond what you showed in your tests.  As you said you didn't, I'm confused why you mentioned natural keys at all.

  • RonKyle wrote:

    Gail, you hypothesized “You should use integers for the key rather than strings. The joins will be faster.”

    If you read the blog post, the hypothesis was "integers are easier to compare than large complex strings.". The "You should use integers for the key rather than strings" was itself a quote, introduced as "This one comes up a lot on the forums, often as advice given…"

    RonKyle wrote:

    Yet you said that you had no comment on natural keys. 

    Again, I did not say that, not in the blog post, not in this thread. I said, that I did not conclude that the results in the blog post should "all but put the nail in the coffin on integers vs naturals keys." (your quote, not mine).

    The blog absolutely had some parting comments (comments, not conclusion) on additional considerations when choosing natural or artificial key. No, I didn't spend several pages thereafter discussing OLAP/OLTP/etc, because that wasn't the point of the blog post. I didn't tell people to never use surrogate keys. I didn't tell people to always use surrogate keys. I wasn't writing a bloody book on database design.

    And thank you for reminding me why I don't miss this site at all.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Again, I did not say that, not in the blog post, not in this thread. I said, that I did not conclude that the results in the blog post should "all but put the nail in the coffin on integers vs naturals keys." (your quote, not mine).

    That was a terrible sentence from Gail. I wish she'd stop that mess, but I know she won't. It implied she didn't have a coffin getting nails in it, but I found the text.

    While this may not the final nail in the coffin for natural keys, it is worth keeping in mind when choosing between natural and artificial keys for a system, especially one likely to process large numbers of rows, such as a decision support/datawarehouse system. Test carefully with expected data volumes and expected loads if you’re considering natural keys and decide based on the result of those tests.

    I'm going on record here as objecting to this coffin, the announcement of the death of Mr. Natural Keys is premature.

    Here's the deal, there are some folks who claim value to the use of natural keys, I'm one of them, but I think the value behind the use of natural keys might not be as easily flushed out by the scientific method as presented by Gail, in fact, many of the challenges programmers face don't fit the easily measurable experiments demonstrated by the article.

    Additionally Gail makes another mistake for conflating "data type" with the natural / surrogate key classification. I'm not saying that the set of natural keys that can be held in the integer datatype is the same set that can be represented by varchars because that would be just as wrong as Gails article, however, its clear that "datatype" describes one concept, the natural / surrogate key duality describes another.

    Look at some of the data warehouse texts out there, clearly folks like the integer data type for some columns, and with data warehouses, moving lots of data fast means we should really care about data types. But even with some of these "date tables", its clear to see that while we can represent the year as an integer (gaining some advantages), that integer still reads like a year so it shares some characteristics of being a natural key. Even an integer representation of the date (YYYYMMDD) can be viewed as a natural key despite the datatype. Am I to now be left with the conclusion that Gail doesn't even KNOW what a natural key is?

    IN MY OPINION the article was hurt by the coffin paragraph, Gail fumbles on what being a natural key actually means, and she additionally doesn't expend much effort in enumerating which subdomains of our challenges can easily benefit from the scientific method and which can't. Opportunities lost, in my opinion. There are PLENTY OF SCIENTIFIC METHOD demonstrations out there that DON'T have these problems. There are also awesome discourses on the complexities of software development that AREN'T solved by counting milliseconds or for that matter, anything easily measurable. This particular article, well I regret reading and discussing it. Yet again, SSC disappoints.

    THERE. I HAVE POSTED MY CAREFULLY CONSIDERED OPINION. YOU MAY NOW COMMENCE WITH THE PERSONAL INSULTS.

     

  • . This particular article, well I regret reading and discussing it. Yet again, SSC disappoints.

    I would ask for your money back.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan Burstein wrote:

    . This particular article, well I regret reading and discussing it. Yet again, SSC disappoints.

    I would ask for your money back.

    Alright guys, Alan has a point so I'm going to be requiring some renumeration. All I need is the billing address. And fair warning, any more static from you folks and I will remove my "frequently disappointed" discount.

     

     

     

  • Fingers crossed, this adds a little light and no more heat...

    From a strictly performance oriented approach, I'm 100% with Gail. Integers are better. Period. Therefore, when performance really counts, I will absolutely default to numbers over natural keys.

    But!

    We have to have both. Yes, from a performance and management stand point, an artificial key based on a number is preferable. However, you still have to add the natural constraint to the table to ensure that you're not allowing duplicate data based on the natural key, which is completely independent of the artificial key. I've seen WAY too many systems that only have the artificial keys in place and then sit around wondering why the data is so filthy. Artificial keys only allow duplicate data where there should be none.

    And!

    Sometimes, where performance may take a back seat, or where the overhead of maintaining two sets of keys hurts more than it helps, I may use a natural key instead of both. However, frankly, that's pretty rare.

    Please, if we can, let's disagree without being disagreeable.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Wow, wow, wow! Hold your horses, people. What's wrong with you all?

    I seem to have missed this excellent article back in May 2015, but I was happy to go commenting on it when it got re-published this week - until I saw this thread. It went down the drain after the first few comments, because everybody took the opportunity to discuss everything but the message in the article.

    As an academic more than an IT person, I can only wholeheartedly agree to the initial call: Sometimes, people should test their claims. Sometimes, people should stop arguing, but merely adhere to established practices (being "best" or not). Sometimes, people should question those established practices. After an empiric observation which breaks your assumptions or beliefs is a really good time! And sometimes it is just a work-habit, testing before using.

    What I originally meant to write as a comment: The original article has the wording "Proving a hypothesis false, is an equally valid and important result as proving one true." This is particularly important, as it touches the fear for making mistakes that can be haunting. But I am trained statistician, and I want people to know that one word should be changed: You can never prove something true! What you do is: you support the hypothesis. Over and over and over. Until the day where you prove it wrong. Because that's what you do: give empirical examples of situations where the hypothesis turned out to be wrong. And that's the only thing you can do, given the Scientific Method. Until you have shown something to not work, you rely on it working. And then you observe as best you can.

  • "You can never prove something true!"

    I remember trying that one in high school geometry class. It didn't go so well 🙂

     

     

  • x wrote:

    "You can never prove something true!"

    I remember trying that one in high school geometry class. It didn't go so well 🙂

    I like that! 🙂 However, high school teachers tend to get itchy when pupils question their teaching. And mathematics (geometry included) build on blocks of "proven wisdom". It is just that... even a mathematical proof can be overturned later in history. It is done. However rare it happens. And the scientists (mathematicians included) have to keep an open mind.

    In reality, you don't get the certainty which comes with living in "math-land". Math is based on ideas and abstractions, and you can have something called an "ideal" state. In "math-land", you can have a straight line. In reality, no such thing exists, and man certainly can't make one. But man get can close, and it may even be possible to measure how close.

    In the real world, a "proof" is often to be understood as statistical significance, and that's what I was thinking about when I wrote my post. In the case at hand, we can't even "prove" anything, because most posters refer to tests and experiences, while the rest refer to textbooks and tradition. That is far from the proofs of the geometry class.

  • hjp wrote:

    You can never prove something true!

    Now there's an interesting hypothesis... I wonder how one would prove that. 😀

    I do get the spirit of that statement, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    hjp wrote:

    You can never prove something true!

    Now there's an interesting hypothesis... I wonder how one would prove that. 😀

    Along the same lines as "a car can't fly". Of course it can. You only have to drive it over a cliff-top. The point is that it is not the supposed use. Neither is a "proof of truth" or my statement (untested hypothesis 😉 ) that one can't exist - but surely we enter fast the domain of physics and philosophy, and I am not trained there 😉 There are many paradoxes and dilemmas in science history, and this surely is one.

Viewing 15 posts - 151 through 165 (of 168 total)

You must be logged in to reply to this topic. Login to reply