Normal Data

  • Comments posted to this topic are about the item Normal Data

  • Boy, is that ever true about feedback. I've taught computer classes around the country for most of the last 25 years. The negative feedback forms that I've received I could see as I was reading your column.

    I must have several thousand positive ones and even a couple I've framed, however, when I close my eyes I can clearly see the negs.

  • One of my favorite stories about decisions based on data!

    Wald applied his statistical skills in World War II to the problem of bomber losses to enemy fire. A study had been made of the damage to returning aircraft and it had been proposed that armor be added to those areas that showed the most damage. Wald's unique insight was that the holes from flak and bullets on the bombers that did return represented the areas where they were able to take damage. The data showed that there were similar patches on each returning bomber where there was no damage from enemy fire, leading Wald to conclude that these patches were the weak spots that led to the loss of a plane if hit, and that must be reinforced. This is still considered today seminal work in the then-fledgling discipline of operational research.

    http://en.wikipedia.org/wiki/Abraham_Wald

  • They're somehow assigning more weight to the negative rating then the all the others. There's perhaps some value here in doing so, since there might be a legitimate complaint, but often we need to discard this one form as an outlier.

    If an attendee gives a negative rating (one or two on a scale of five), then they should provide related comments. For example, if The speaker didn't present well; then how exactly? Was it the volume of his voice, the quality of the slides, or perhaps the inappropriateness of an opening joke?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • ... there might be a legitimate complaint, but often we need to discard this one form as an outlier.

    Chances are it seemed legitimate to the person who wrote the complaint.

    As an IT manager, I found I dismiss 'outlier complaints' - from my staff and from my customers - at my own risk. There is often wisdom to be had from such complaints.

    I'd rather have the one complainer than a thousand yes-men.

  • I love looking at data, seeing if I can find patterns, or hunting down the odd duck, and then relating it to the business.

  • Our ETL process ingests data from over a hundred external clients, and what is accepted as "normal" is a constant debate. There is a pre-production load step called AutoCertification that performs record count, cardinality, and standard deviation queries before proceeding to load. If a dataset is outside a configured threshold, then it's held back in stage, flagged, and an alert will show in the ETL monitoring dashboard. If the data analyst clears an alert, then at that point the standard, min and max can be edited, thus defining a "new normal" for that specific data source going forward.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I think this is an important post for every DBA to read. I have been a DBA for almost 30 years (started way back in the ancient days on the mainframe). DBAs are notorious in thinking that averages are as reliable as moral codes chiseled in stone. One distribution that didn't get mentioned in the article is bimodal. Basically your distribution diagram looks like a two-humped camel. If you have data like this, an average is as meaningful as the meeting minutes from the local Liars' Club.

    We should all be grateful that the DBMS vendors have made the Optimizer aware of the statistics meta data found in the system tables. For example, you might think that an index will fix a performance problem when searching on a column that has 100 unique values. The problem is, three of the values make up 95 percent of the data. Statistics will help for such non-uniform distribution. If you are searching for one of the three values, the Optimizer won't use the index (assuming your statistics are current). On the other hand, if you are searching for one of the rarely populated values, you can get enormous amount of filtering/selectivity.

    The rule here is to know your data, run reorgs/rebuild regularly on your indexes, and run statistics religiously.

  • GoofyGuy (1/28/2015)


    ... there might be a legitimate complaint, but often we need to discard this one form as an outlier.

    Chances are it seemed legitimate to the person who wrote the complaint.

    As an IT manager, I found I dismiss 'outlier complaints' - from my staff and from my customers - at my own risk. There is often wisdom to be had from such complaints.

    I'd rather have the one complainer than a thousand yes-men.

    Often, not always.

    The complainers do sometimes point out things that need to be handled, but in my experience across hundreds of talks, they usually don't. Often they complain about the agenda they didn't read, or the light in the room, which isn't helpful to me, and often can't be changed. Or the audio when they sat at the front (loud) or back (soft).

    I read every comment, and judge them. Often they're discarded because they don't make sense.

  • patrickmcginnis59 10839 (1/28/2015)


    One of my favorite stories about decisions based on data!

    Wald applied his statistical skills in World War II to the problem of bomber losses to enemy fire. A study had been made of the damage to returning aircraft and it had been proposed that armor be added to those areas that showed the most damage. Wald's unique insight was that the holes from flak and bullets on the bombers that did return represented the areas where they were able to take damage. The data showed that there were similar patches on each returning bomber where there was no damage from enemy fire, leading Wald to conclude that these patches were the weak spots that led to the loss of a plane if hit, and that must be reinforced. This is still considered today seminal work in the then-fledgling discipline of operational research.

    http://en.wikipedia.org/wiki/Abraham_Wald

    I loved this! Thanks for including it. Now, if I could only learn to code more like this.

    Sigerson

    "No pressure, no diamonds." - Thomas Carlyle

  • Steve Jones - SSC Editor (1/29/2015)


    GoofyGuy (1/28/2015)


    ... there might be a legitimate complaint, but often we need to discard this one form as an outlier.

    Chances are it seemed legitimate to the person who wrote the complaint.

    As an IT manager, I found I dismiss 'outlier complaints' - from my staff and from my customers - at my own risk. There is often wisdom to be had from such complaints.

    I'd rather have the one complainer than a thousand yes-men.

    Often, not always.

    The complainers do sometimes point out things that need to be handled, but in my experience across hundreds of talks, they usually don't. Often they complain about the agenda they didn't read, or the light in the room, which isn't helpful to me, and often can't be changed. Or the audio when they sat at the front (loud) or back (soft).

    I read every comment, and judge them. Often they're discarded because they don't make sense.

    My remarks were directed more toward employees and customers, as opposed to audience members. Unlike you, I don't get out much.

  • I agree with you Steve, but sometimes I wonder. Where I worked we had 18 years worth of data which I've maintained and advocated for years, should be analyzed to learn what it tells us. Yes, some of this stuff I could have done, but the data is more about human behavior, something that as a .NET developer and occidental DBA, I didn't feel qualified to analyze. After all, I'm not a clinician nor a research psychologist. My suggests and requests fell on deaf ears.

    Oh well.

    Rod

  • Doctor Who 2 (1/31/2015)


    I agree with you Steve, but sometimes I wonder. Where I worked we had 18 years worth of data which I've maintained and advocated for years, should be analyzed to learn what it tells us. Yes, some of this stuff I could have done, but the data is more about human behavior, something that as a .NET developer and occidental DBA, I didn't feel qualified to analyze. After all, I'm not a clinician nor a research psychologist. My suggests and requests fell on deaf ears.

    Oh well.

    We Occidental DBAs are perfectly qualified to analyze data.

    Did you intend to say "accidental DBA"?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Doctor Who 2 (1/31/2015)


    I agree with you Steve, but sometimes I wonder. Where I worked we had 18 years worth of data which I've maintained and advocated for years, should be analyzed to learn what it tells us. Yes, some of this stuff I could have done, but the data is more about human behavior, something that as a .NET developer and occidental DBA, I didn't feel qualified to analyze. After all, I'm not a clinician nor a research psychologist. My suggests and requests fell on deaf ears.

    Oh well.

    It isn't about you analyzing the data as much as it is you understanding different algorithms that might be used. I've certainly not understood the basis for analysis, but I do know about some stats and patterns.

    I suggest things. They tell me I'm an idiot. OK, but why. They tell me why, and I learn. I then ask them what to do, or what to try, and implement it. It doesn't work. We're both idiots, but we've following Edison. We're finding the 10,000 ways not to do things on the way to the better way.

    It takes some understanding that you're not going to get it all. That you're going to make mistakes. That you're going to need to learn constantly and be humble when you're suggestions aren't accepted.

    Be Edison.

  • Sigerson (1/29/2015)http://en.wikipedia.org/wiki/Abraham_Wald

    I loved this! Thanks for including it. Now, if I could only learn to code more like this.

    What, your databases aren't regularly exposed to anti-aircraft fire?!

    Wimp. 😉

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply