Simpsons' Paradox

  • Comments posted to this topic are about the item Simpsons' Paradox

  • Good article because it points out the very critical issue how easily data can easily be misread to support one's position.

    Interesting side point, the male/female numbers are in themselves a bit suspicious, almost as if they had been adjusted to be equal. A number of developmental problems (like autism) strike males significantly more than females... the fact that the expenditures are so close could suggest that someone is trying to avoid charges of discrimination.

    ...

    -- FORTRAN manual for Xerox Computers --

  • This was a fascinating article. It took me a bit to figure out why the age based results could be so even yet the overall not. But now I understand. Wow, statistics really can lie, or not depending on how the data is presented.

  • Simpson's Paradox as per this article shows the danger of looking for data that fits your agenda. You can take almost any set of data, twist and contort it to your liking, and present it as fact. Leave a variable out as shown and there is instant bias. If it doesn't fit your needs you can introduce another variable outside of the set to produce a favorable outcome. Present your view with the "facts" and let the controversy begin.

    If care isn't taken this can potentially cause a company to fail or on a much larger scale cause national false news. (shameful use of this years new buzz phrase)

  • Interesting article indeed.

    As other commentators have noted, statistical results can be manipulated in many ways to fit the desired outcome.

    I'd like to add that the 'input' side of the equation can be massaged and manipulated so as to obfuscate the sometimes blatant manipulations of the output side... please refer to the Podesta email on wikileaks there is this open acknowledgement of systematic 'oversampling'.

    https://wikileaks.org/podesta-emails/emailid/26551

    So, yes, you can add/remove a few variables to influence the results, OR you can just contour the samples and polls to collect the data in such a way that the source data set already reflects the result you want - the direction you wish to lead people.

    Mark
    Just a cog in the wheel.

  • Thank you to everyone for your comments. My personal take on this is that it is the nature of data. It is not exactly about 'manipulation' as much as it is about understanding perspectives of data change with how you choose to view it. A balanced data analyst will dare to present multiple views and add the clause that the truth depends on perspective. It can be used for manipulative reasons without a doubt and it often is. But it is really not about statistics lying or manipulation - it is about understanding the complexitites of data analysis and to be sure we look at it from many angles before arriving at one solid conclusion.

  • starunit (1/10/2017)


    Interesting article indeed.

    As other commentators have noted, statistical results can be manipulated in many ways to fit the desired outcome.

    I'd like to add that the 'input' side of the equation can be massaged and manipulated so as to obfuscate the sometimes blatant manipulations of the output side... please refer to the Podesta email on wikileaks there is this open acknowledgement of systematic 'oversampling'.

    https://wikileaks.org/podesta-emails/emailid/26551

    So, yes, you can add/remove a few variables to influence the results, OR you can just contour the samples and polls to collect the data in such a way that the source data set already reflects the result you want - the direction you wish to lead people.

    I'd never heard of oversampling or this controversy until you mentioned it, but a simple google search suggests this is a ridiculous and debunked conspiracy theory.

    http://www.pewresearch.org/fact-tank/2016/10/25/oversampling-is-used-to-study-small-groups-not-bias-poll-results/

    http://www.politifact.com/truth-o-meter/statements/2016/oct/25/donald-trump/trump-absurd-claims-podesta-rigged-polls/

    Regardless, your point is absolutely correct and well-taken. There are abundant opportunities to confuse and obfuscate in statistical analysis. And in our everyday communications, clearly. 😉

  • Very interesting. Thanks for posting. I hear this all the time..."Just the facts please, the facts don't lie." But, in reality, like you've demonstrated, while the facts aren't a lie they indeed can make many perceived truths. We see this in the USA everyday from the media.

  • That has to do with nature of the media to take sides as well as the need for people to hear one specific click-baity perspective instead of the whole thing. In general it is about nuance and patience for more than one version of the truth. As data people we mus strive for that even if that opposes what we believe in politically and may not be what we want to hear.

  • Diligentdba 46159 (1/10/2017)


    That has to do with nature of the media to take sides as well as the need for people to hear one specific click-baity perspective instead of the whole thing. In general it is about nuance and patience for more than one version of the truth. As data people we mus strive for that even if that opposes what we believe in politically and may not be what we want to hear.

    I absolutely agree but I have to admit, I find myself hearing something I don't agree with and given the track record of the media, I automatically assume, "BAH, they probably didn't include X or Y or even Z, so I don't believe it!"

  • I absolutely agree but I have to admit, I find myself hearing something I don't agree with and given the track record of the media, I automatically assume, "BAH, they probably didn't include X or Y or even Z, so I don't believe it!"

    The few situations where I've dealt with reporters, I am astonished of how wrong they get the story.

    ...

    -- FORTRAN manual for Xerox Computers --

  • jay-h (1/10/2017)


    I absolutely agree but I have to admit, I find myself hearing something I don't agree with and given the track record of the media, I automatically assume, "BAH, they probably didn't include X or Y or even Z, so I don't believe it!"

    The few situations where I've dealt with reporters, I am astonished of how wrong they get the story.

    I've had these experiences too! Now couple those same people with the reporting of statstical analysis and you've got misrepresented data that gets the masses worked up into a frenzy...but hey, it's good for ratings! :hehe:

  • Of those a really good article because people are not aware of problems with statistics. The classic book on this is "how to lie with statistics." By Darrell Huff. It is old and has stayed in print since the nineteen fifties. Another good book is Stefan K Campbell's "flaws and fallacies in statistical thinking", which is along the same lines. A quick introduction to the computational side is "how to think about statistics (revised edition)" by John L Phillips Jr.

    My only complaint the article is that you did not post a table. A table by definition has to have a key, in a properly designed table should not have redundancies and it. A well-designed table would follow ISO 1179 rules, industry standards, and other things. Do you really need 50 character long strings? Why do you believe there is a generic "id" in RDBMS? Why are you using floating-point for money (that is actually illegal)? Why do you have both agent and age cohort in the same table? We do not store computed data, but if you wanted to do it then you would have needed to have a check constraint to make sure that they are actually redundant instead of conflicting. Which ethnicity code did you use? Etc. I know you just dashed this out in a hurry, but it just bothers me to see such bad SQL coding.

    CREATE TABLE Paradox_Data

    (sample_id CHAR(5) NOT NULL PRIMARY KEY,

    birth_date DATE NOT NULL,

    sex_code CHAR (1) NOT NULL

    CHECK (sex_code IN (‘0’, ‘1’, ’2’, ‘9’),

    expenditure_amt DECIMAL (12,2) NOT NULL,

    ethnicity_code CHAR(5) NOT NULL);

    Again a good article and it might be worth doing another follow-up on some other statistical paradoxes.

    Books in Celko Series for Morgan-Kaufmann Publishing
    Analytics and OLAP in SQL
    Data and Databases: Concepts in Practice
    Data, Measurements and Standards in SQL
    SQL for Smarties
    SQL Programming Style
    SQL Puzzles and Answers
    Thinking in Sets
    Trees and Hierarchies in SQL

  • Sir, am so honored that you should comment on my post. Thank you. Your books have helped me a huge lot through my career. Your comment is wholly valid and will make sure it has a key.

  • thisisfutile (1/10/2017)


    Very interesting. Thanks for posting. I hear this all the time..."Just the facts please, the facts don't lie." But, in reality, like you've demonstrated, while the facts aren't a lie they indeed can make many perceived truths. We see this in the USA everyday from the media.

    It's true that facts don't lie, but statistics aren't facts. They can help in search of facts and that's all they should be used for, they should never be considered as such.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply