• When the information is second or third hand, there is no way to know the quality in most cases. You can take three or four political surveys, for example, but the questions were asked in different ways, in different contexts of other questions, with different sample selections and on different days when different headlines were in the news. All those characteristics are significant, but the user of the data has no control over them.

    Medical data is another example. Treatment providers need to categorize patient conditions and treatment into a large range of fixed categories, none of which may accurately reflect the full reality. Further distortion occurs because choosing the wrong checkbox can mean difficulty in collecting from insurance so physicians tend to follow certain patterns (I'm not talking fraud here, but when more than one choice can legitimately apply, there is a tendency to choose the most pragmatic).

    ...

    -- FORTRAN manual for Xerox Computers --