The Scientific Method: a call to action

  • Jeff Moden (5/24/2015)


    Well stated and true.

    Only well stated if you ignore the really ghastly typo ("text" should have been "test") that my proof-reading missed - I didn't notice it until you quoted it.

    GilaMonster (5/24/2015)


    Of course, it's a fair bit more complex than what I wrote. The full scope is well beyond an editorial.

    I agree, an editorial can't cover the full scope, and shouldnn't be expected to. My comment was not intended as criticism of your editorial, as I rate it as one of the best editorials I've seen here - so I'm sorry if it gave a diferent impression. I just wanted to remind people that for software successful testing (like a scientist's experiments) only indicates correctness within a scope determined by the content of the tests, so they should be careful even of assertions that are backed by tests and check that the tests actually cover what they need. I know from bitter experience (and I'm sure that you do too) that a lot of people forget that and go ahead without making the check.

    Tom

  • meilenb (5/24/2015)


    No one is advocating shipping without testing so your comment on that matter is not correct.

    Then what is this about? Seems to be contradictory to what you've been saying.

    But if you don't get the product out the door you will never have any money to fix it later.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Comments posted to this topic are about the item The Scientific Method: a call to action

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • TomThomson (5/24/2015)


    My comment was not intended as criticism of your editorial, as I rate it as one of the best editorials I've seen here - so I'm sorry if it gave a diferent impression.

    No, not at all. I understood it as you intended it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Nice one Gail and good point Jeff about the experiment needing to be reproducible.

    Reproducible implies that the conditions of the experiment should have been meticulously recorded. One point I have picked up from the Big Data world is that a fact established today might not be a fact in a months time. This is purely down to the rapid change in versions, or in other words the conditions of the experiment.

    The pre-oracle MySql site had an article stating that MySql outperformed SQL Server in many cases. What wasn't stated was the tests carried out or any of the items that would make them repeatable. I know that in certain circumstances MySql was faster but that was in the SQL 7 era and when comparing both on minimum spec hardware.

    Similarly I evaluated Infobright as a potential column store for datamarts. Reproducing AdventureworksDW in Infobright showed that community edition was significantly faster than 2008R2 and that enterprise Infobright was at least 4x faster than community edition. If I took this finding as an absolute that Infobright out performs SQL Server I would be authoring a myth that would be hotly contested for ages. I reran the Experiment with SQL 2012 column store indexes and for the same queries on the version of Infobright available at the time SQL2012 dramatically exceeded all prior performance benchmarks.

    Another point to watch for when evaluating tools is what the recommended spec is for that tool. Running identical experiments on identical kit is not enough If that kit is below recommended spec for one of the participants.

  • meilenb (5/23/2015)


    I'm in -- except I wouldn't want everybody doing these tests to prove something that has already been proved.

    I would be more inclined to have a mix of best practices and scientific practices. If someone needs to prove an int is faster than a string in a join (in your organization) because there is a debate, then prove it. Then make it a best practice so that you don't have to keep proving it over and over.

    Some people really hate best practices, but maybe that speaks more for the poor quality of the art than the downsides of best practices. Not everything is performance but even then, if we did think performance were the only criteria, would we have to include loops and cursor options in our solution options or could we look at vetted experience of respected folks and take their word? I know I now try a set oriented solution based on what I've read. Is that a bad thing? Should I also include a cursor based example because best practices suck?

    Performance as a metric is pretty easy to measure, correctness less so, and subjective areas like maintainability and extendability would require periods of experimental measurement that could be hard to budget, but (except for the best practice haters) we can instead learn from our predecessors on what has been found to work. I'm happy to say for the sake of the thread that these predecessors very well may have also used the scientific method, its just that some metrics of success don't appear instantly.

    Also, if you are debating join performance of a string vs. an int, maybe your just having the wrong conversation altogether. Maybe the approach ought to be to ask "why do you want a string vs. an int for your key?". The answer might be "because I don't have to join to "Customer" to get CustomerName" when querying "Orders" for a specific use case.

    If this is the answer, then the root of the concern is not necessarily string vs. int. The question might be "to join or not to join".

    So the complexion of the concern might be something really different than how the individual is attempting to express his or her concern.

    I would be suspicious selecting data types based on join performance, so I'd probably like to have the other conversation too.

  • TomThomson (5/24/2015)


    Jeff Moden (5/24/2015)


    Well stated and true.

    Only well stated if you ignore the really ghastly typo ("text" should have been "test") that my proof-reading missed - I didn't notice it until you quoted it.

    GilaMonster (5/24/2015)


    Of course, it's a fair bit more complex than what I wrote. The full scope is well beyond an editorial.

    I agree, an editorial can't cover the full scope, and shouldnn't be expected to. My comment was not intended as criticism of your editorial, as I rate it as one of the best editorials I've seen here - so I'm sorry if it gave a diferent impression. I just wanted to remind people that for software successful testing (like a scientist's experiments) only indicates correctness within a scope determined by the content of the tests, so they should be careful even of assertions that are backed by tests and check that the tests actually cover what they need. I know from bitter experience (and I'm sure that you do too) that a lot of people forget that and go ahead without making the check.

    It's a common typo that I make myself. One thing that that working in a worldwide community forum has also taught me is to not be a spelling or "correct Ingrish" zealot, either. Outright trolling and arrogance notwithstanding, it's normally better to understand the intent than the actual words used.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It would be nice to have this scientific method to all the blab about security as well. Like "I shouldn't give the sysadmin password to the developers because of security risk". How do you quantify this security risk? Using what research? Usually DBA use "good practice" and other B.S. to cover their own ignorance but then are very keen to ask others to give "proof" of what they say.

  • Kyrilluk (5/26/2015)


    How do you quantify this security risk? Using what research?

    That's an area called "Risk Management". You quantify it by looking at three aspects: threat, vulnerability and asset (at its simplest). It can be done in a fully quantitative way, and there's a lot of research in the area if you want to read up on it.

    Giving the developer sysadmin access to the dev server may be fine. The asset is not a business-critical one, the data on there is probably not sensitive so the threat of damage or exposure is small. If the system is the main production system running a hospital, full of sensitive healthcare data, then you probably want the bare minimum of people to have elevated access because the threat is so much higher and the asset far more critical. You probably want auditing and other forms of protection as well on there, etc.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Risk Management tends to focus on the people who have a lot of assets to manage. I rarely see an assessment with the cost of current risk avoidance methods, or a broad assessment of the risk.

    The scientific method, which I like, also tends to focus on small testable answers rather than gut reactions. As big data becomes more important, so with the method.

    412-977-3526 call/text

  • Kyrilluk (5/26/2015)


    It would be nice to have this scientific method to all the blab about security as well. Like "I shouldn't give the sysadmin password to the developers because of security risk". How do you quantify this security risk? Using what research? Usually DBA use "good practice" and other B.S. to cover their own ignorance but then are very keen to ask others to give "proof" of what they say.

    To be sure, I don't ever blindly follow "best practice". Either I already have a proof or will have one shortly. I also practice everything I preach especially when it comes to final work going to production.

    I also give no one the "sysadmin" (SA) password. I don't even know what it is and I'm the only DBA. If needed, the folks in NetOps can dig it out of the password safe and that it follows the correct password restrictions/profile.

    Shifting gears and unlike many shops, I don't allow write privs on production for the developers. In sharp contrast, I do give them "sysadmin" privs on the Dev boxes with the understanding that 1) they are not to do anything that a DBA would normally be required for like granting privs or doing backups/restores or changing system configs and 2) nothing get's moved to staging without following the correct protocols setup in our standards. I still maintain the final say so on any code.

    Shifting to high gear, I rather enjoy the relationship that I've setup with the Developers. I sit right in the middle of them and we work together as a well oiled machine. There are no fences between us and management loves it and supports the standards we've worked out because they've seen the incredible improvements in code, resulting sharp decrease in rework (almost zero defects) and we've easily passed multiple audits from multiple factions.

    A whole lot of this can be directly attributed to us using the Scientific Method not only for code methods, but for process and people methods, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I am surprised that nobody has mentioned the obvious next requirement which is an accessible and searchable repository where people can research and find previously performed proofs to allow people to build on the work of others rather than continually inventing the wheel.

    Will SQLServerCentral step up?

  • Thanks, good points.

    See also this PDF on understanding the difference between a hypothesis and a prediction:

    http://datanuggets.org/wp-content/uploads/2014/01/Strode_NABT-2011-Hypothesis-Presentation.pdf

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • paul.s.lach (5/26/2015)


    I am surprised that nobody has mentioned the obvious next requirement which is an accessible and searchable repository where people can research and find previously performed proofs to allow people to build on the work of others rather than continually inventing the wheel.

    Will SQLServerCentral step up?

    The "Stairway" series is a part of that. As for articles on different subjects, who would judge what is best? Lot's of people have their favorites and use the "Brief Case" function to "memorize" those.

    At work, we have a Wiki and we have a "function repository" (most new ones are iTVFs, for sure).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/26/2015)

    At work, we have a Wiki and we have a "function repository" (most new ones are iTVFs, for sure).

    Which wiki do you use?

    412-977-3526 call/text

Viewing 15 posts - 31 through 45 (of 168 total)

You must be logged in to reply to this topic. Login to reply