The Scientific Method: a call to action

  • Jeff Moden

    SSC Guru

    Points: 994667

    meilenb (5/23/2015)


    I don't require you to agree with me.

    I don't require you to be right either.

    Heh... remember that.;-)

    meilenb (5/24/2015)


    But if you don't get the product out the door you will never have any money to fix it later. Successful people don't wait around while scientists argue the merits of their experiments.

    It's funny how people think that. If you get the product out the door and it's broken, you're going to need a whole lot more money to fix it than if you did it right the first time. The hacks on several large corporations in the last two years are fine testimony to that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 994667

    TomThomson (5/23/2015)


    That's a great editorial, and I totally agree with it. If everyone based design and build of systems only on verified (or at least adequately tested) hypotheses we would suffer a lot less from bugs and other unintended behaviour, the cost of software/system development and support would be significantly reduced, and it is really crazy to push out assertions without any supporting evidence, which ultimately has to be text and measurement based. Unfortunately, while many developers, dbas, and engineers understand that, there are a lot of managers out there who either have forgotten it or never understood it and drive their staff into forgetting it.

    But the scientific method doesn't generally lead to proofs - the hypotheses are falsifiable by test (or at least they were until string theorists got hold of physics) but a failure to falsify merely means that the predictions the hypothesis leads to are good for certain experiments and only within the limits of accuracy of measurement, not that it's some sort of absolute truth. Many experiments may be needed to give a decent degree of confidence, and even after that we may hit circumstances where the hypothesis turns out not to fit - for example Newtonian mechanics survived expirement for centuries, but is now known to be false (although it's still valid for many purposes) because it doesn't work for things on the atomic or smaller scale and it doesn't work on the astronomical scale (gets the orbit of Mercury wrong, for example).

    Yes, I prefer to prove things - I'm still a mathematician at heart; and yes, I can prove some statements about performance because they may be simple statements of pure mathematics. But there are far more performance statements where all I can prove is that there's a decent probability that a particluar algrithm will give better performance than another, because which performs best often depends on what data is fed to them and what environment they run in, and in some cases I can't even do that to any useful extent (in which case I'd prefer not to produce any software for that problem). But I can formulate hypotheses and test them and publish the hypotheses and the tests even though by doing that I get no proof of correctness, only either a better degree of confidence or a proof of incorrectness - which is of course what happens in real science (as opposed to in maths and logic and string theory) and is what the scientific method is (or perhaps used to be) all about.

    And sometimes I want to throw some software together in order to get some measurements in order to formulate some hypotheses - experimentation to allow hypotheses to be formulated is just as much part of the scientific method as is experiment to test, and is clearly applicable in the computing world. "I did this and got these results and can't make head or tail of them" is a perfectly reasonable thing to say in a scientific paper too, and if computing and/or software engineering are science based that sort of paper must be allowed too, not just papers describing hypotheses and the experiments carried out to check them.

    Well stated and true. That's what I meant when you have to reprove experiments that have been cited rather than just take their word for it. The experiment has to be setup correctly and not just for the question at hand.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • meilenb

    Old Hand

    Points: 340

    It doesn't have to be broken to get it out the door.

    That is a false choice.

  • Lynn Pettis

    SSC Guru

    Points: 442143

    meilenb (5/24/2015)


    It doesn't have to be broken to get it out the door.

    That is a false choice.

    If it doesn't meet the customers expectations, or fails to meet the customers needs, it's broken.

  • Gail Shaw

    SSC Guru

    Points: 1004446

    TomThomson (5/23/2015)


    But the scientific method doesn't generally lead to proofs - <large snip>

    Of course, it's a fair bit more complex than what I wrote. The full scope is well beyond an editorial.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • meilenb

    Old Hand

    Points: 340

    The definition of broken is not at issue.

    The assumption something is broken is not a valid assumption just because you disagree with a point.

    Funny how people need to suggest this in order to validate their argument.

    Very unscientific...

  • Lynn Pettis

    SSC Guru

    Points: 442143

    meilenb (5/24/2015)


    The definition of broken is not at issue.

    The assumption something is broken is not a valid assumption just because you disagree with a point.

    Funny how people need to suggest this in order to validate their argument.

    Very unscientific...

    If you fail to test your assumptions while building your application and use constructs that are not scalable or are inefficient simply to get your application out the door, then you are releasing a "broken" application. It will cost you more to fix it later than if you had bothered to it right from the start.

  • meilenb

    Old Hand

    Points: 340

    True,

    But nobody is advocating that.

  • Jeff Moden

    SSC Guru

    Points: 994667

    meilenb (5/24/2015)


    It doesn't have to be broken to get it out the door.

    That is a false choice.

    The problem is that it frequently is broken but no one knows because they elected to ship rather than test. This is especially true when it comes to scalability and concurrency.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • meilenb

    Old Hand

    Points: 340

    Again true

    And again, nobody is advocating that

  • Jeff Moden

    SSC Guru

    Points: 994667

    meilenb (5/24/2015)


    Again true

    And again, nobody is advocating that

    Not correct. It's certainly a part of what the article covered... and advocated.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • jurgen.lottermoser

    Valued Member

    Points: 70

    Ideally, we'd put the same values in both columns. However, that would limit the test data set to 9999 rows, as '10000' doesn't fit into a char(4).

    wouldn't having the same values in both columns in this case mean turning the ints into strings rather than having the same number in both of them? i.e. so that they actually contain the same binary data? e.g. 10,000 is 0010011100010000, which is "'?" where the ? is character 16, which in ascii is a non-printing control character.

    in which case, why would joining on ints be faster than joining on strings? is it because depending on the encoding of the string it can't just be treated as a number? but if the encoding is the same then surely they can? or is it that some values cause problems for some encodings, e.g. the first 32 values of ascii encoding.

  • Gail Shaw

    SSC Guru

    Points: 1004446

    jurgen.lottermoser (5/24/2015)


    wouldn't having the same values in both columns in this case mean turning the ints into strings rather than having the same number in both of them?

    You could certainly test that way, though you'll have to be careful to keep the column sizes the same. Let me know what differences in performance you see when tested that way. 🙂

    To be honest, I wasn't trying to keep the values exactly the same, I wanted to keep the size the same and the easiest way to populate the two tables was to use the same data generation and just CAST the int to string. Maybe it was a flawed methodology, feel free to re-test your way and either refute or confirm my results (either is good)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • meilenb

    Old Hand

    Points: 340

    No one is advocating shipping without testing so your comment on that matter is not correct.

  • Lynn Pettis

    SSC Guru

    Points: 442143

    meilenb (5/24/2015)


    No one is advocating shipping without testing so your comment on that matter is not correct.

    From your statement here (emphasis mine):

    ROI = (Gain from Investment - Cost of Investment) / Cost of Investment

    Yes, I hear the "to be" replies (It costs less to catch a flaw up front than to fix it later, etc.). But if you don't get the product out the door you will never have any money to fix it later. Successful people don't wait around while scientists argue the merits of their experiments.

Viewing 15 posts - 16 through 30 (of 168 total)

You must be logged in to reply to this topic. Login to reply