Lossy data and incorrect data

Question

Lossy data and incorrect data

Phil Factor

SSC-Insane

Points: 20244
More actions
August 13, 2013 at 9:36 pm

#297701

Comments posted to this topic are about the item Lossy data and incorrect data
Best wishes,
Phil Factor

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 1

This highlights one of the downsides of the digital revolution. Digital capture and reproduction of the physical phenomena of sounds and images always lose some data. With CDs, the sampling rate is high enough that most listeners never notice that the waveform data from points in between samples is missing, but the groove on a vinyl record is still a more complete representation of the waveform that existed at the time of recording. Same with copy machines - a machine that scans, digitizes, and then prints the image loses some data, but the resolution is usually high enough that most people don't notice a difference, while a photostatic or electrophotographic copy machine produces an image that is much more true to the original.

The conveniences of digital processes make them preferable to the analog processes of the past. The technology has advanced to the degree that digital capture and reproduction processes are adequate for any purpose except those that require the utmost fidelity to the original. Xerox's press release [/url] confirms that the problems noted in the article Phil linked are related to bugs in the software that compresses the captured images rather than any inherent limitation of digital capture and reproduction.

Somewhat OT, I'm old enough to remember using a high-speed Kodak electrophotographic copier with a fancy finishing unit attached. The whole thing was about 20 feet long, and, like all electrophotographic copiers, it had to capture the image of the original for every copy made, so it had a mechanism that ran the original over and over the platen - it worked so fast that the flash lamp seemed almost like a strobe light.

Jason Wolfkill

Tobar SSCarpal Tunnel Points: 4876 More actions · Answer 2

Oh to be a lawyer working on this case. You will most likely be assured a job for life. It will take years just to get everyone to understand the problem. [ I was going to put a smiliey in here, but the consequences of the possible alteration to "financial records or medical doses" takes some of the humor shine off for me. ]

<><
Livin' down on the cube farm. Left, left, then a right.

shoestringdba SSCertifiable Points: 6226 More actions · Answer 3

wolfkillj (8/14/2013)
Somewhat OT, I'm old enough to remember using a high-speed Kodak electrophotographic copier with a fancy finishing unit attached. The whole thing was about 20 feet long, and, like all electrophotographic copiers, it had to capture the image of the original for every copy made, so it had a mechanism that ran the original over and over the platen - it worked so fast that the flash lamp seemed almost like a strobe light.

Ok, do I really have to mention ditto machines? 😀 Youngsters...

____________
Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.

Tobar SSCarpal Tunnel Points: 4876 More actions · Answer 4

And don't forget Mimeograph!! Which I learned is much older technology. Ah, the Ditto aroma, a strong memory still. http://www.chattanoogan.com/2006/7/27/89955/Remembering-the-Ditto-and-Mimeograph.aspx

<><
Livin' down on the cube farm. Left, left, then a right.

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 5

lshanahan (8/14/2013)
wolfkillj (8/14/2013)
Somewhat OT, I'm old enough to remember using a high-speed Kodak electrophotographic copier with a fancy finishing unit attached. The whole thing was about 20 feet long, and, like all electrophotographic copiers, it had to capture the image of the original for every copy made, so it had a mechanism that ran the original over and over the platen - it worked so fast that the flash lamp seemed almost like a strobe light.
Ok, do I really have to mention ditto machines? 😀 Youngsters...

Well, I'm old enough to have seen and extensively used copies made on mimeograph and ditto machines, but never really used the machines themselves. I remember how it could be so much harder to read the last copy to come off the ditto machine than the first due to the depletion of ink supply on the "ditto master". And who can forget that ditto machine smell?

Jason Wolfkill

Tobar SSCarpal Tunnel Points: 4876 More actions · Answer 6

wolfkillj (8/14/2013)
Well, I'm old enough to have seen and extensively used copies made on mimeograph and ditto machines, but never really used the machines themselves.

Old enough to have used the machines my first year of teaching.

I remember how it could be so much harder to read the last copy to come off the ditto machine than the first due to the depletion of ink supply on the "ditto master".

Happy, after the fact, that my last name is early in the alphabet. At least when we were seated in name ascending rank arrangement. 😛

<><
Livin' down on the cube farm. Left, left, then a right.

chrisn-585491 SSCoach Points: 16006 More actions · Answer 7

Database people have a natural suspicion of the quality of data until they’ve proven to their satisfaction that it is OK.

I have a motto:

"All data is bad. Some data is just worse than other data..."

Make no assumptions about any data at any time. Always validate and test!

Tobar SSCarpal Tunnel Points: 4876 More actions · Answer 8

chrisn-585491 (8/15/2013)
"All data is bad. Some data is just worse than other data..."

True, true.

<><
Livin' down on the cube farm. Left, left, then a right.

marlon.seton SSCrazy Points: 2623 More actions · Answer 9

Tobar (8/14/2013)
wolfkillj (8/14/2013)
Well, I'm old enough to have seen and extensively used copies made on mimeograph and ditto machines, but never really used the machines themselves.
Old enough to have used the machines my first year of teaching.

Used the Banda / Gestetner (what we call the Ditto in the UK) machine my year of teacher training ('79-'80) and first couple of years of teaching. Photocopying would have cost about 30 - 50p (45 - 75c) a sheet. By the time I quit teaching in '86, schools had photocopiers in general use.

I remember how it could be so much harder to read the last copy to come off the ditto machine than the first due to the depletion of ink supply on the "ditto master".
Happy, after the fact, that my last name is early in the alphabet. At least when we were seated in name ascending rank arrangement. 😛

You had to sit in name ascending rank arrangement? I cannot imagine something so militaristic.

Gary Varga SSC Guru Points: 82166 More actions · Answer 10

This is quite horrifying as projects that I have been involved in that had some component of document management employing scanning then destroying originals did not come across this issue yet I would be exceptionally surprised if all of them were unaffected.

Fortunately for me I was never working on this part of the systems...phew!!!

Gaz

-- Stop your grinnin' and drop your linen...they're everywhere!!!

Miles Neale SSChampion Points: 13147 More actions · Answer 11

Thanks Phil for taking the time and penning this piece. When I first started reading I was wondering how it would tie in and boom there it is. This has some very interesting ramifications in Public Disclosure cases and retention of the Copy of Record for legal purposes.

M...

Not all gray hairs are Dinosaurs!

Tobar SSCarpal Tunnel Points: 4876 More actions · Answer 12

marlon.seton (8/21/2013)

[quote-0You had to sit in name ascending rank arrangement? I cannot imagine something so militaristic.[/quote-0]

I imagine it was strictly for the teachers to be able to learn the names easier. This was in 1 - 6 grade mostly I think, but I seem to remember a couple of times in secondary school.

<><
Livin' down on the cube farm. Left, left, then a right.

TomThomson SSC Guru Points: 104773 More actions · Answer 13

Anyone who did what is suggested in the last sentence of the editorial ought to be given a good thrashing with a clue stick. (My comments here are relevant to nothing other than that last sentence.)

Numerics are far better compressed by OCR than by image compression, and the rule should be to use OCR to get the numeric (and alphabetic) components BEFORE any compression is done (since areas of image can usually be then thrown away without loss of anything useful).

Note the positions and sizes of the alphanumeric chunk in the image. replace them by neutral background in the image, compress the revised image, then store the positions/sizes and text along with it. This will give better compression than compressing the image with the text/numeric data in it, and will ensure the accuracy (to the limits of the OCR plus whatever checking on the OCR is done) than compressing the text/numeric with the image, thus giving better compression and better accuracy at the same time, in all cases where the background of the text/numeric data is not important - which means in just about every case where the text/numeric content has any legal implication. Of course "in just about every case" doesn't mean always, there are cases where this technique is not good enough; even then, it's going to be better than compressing the image and using OCR on the result in any case where the accuracy of the numeric/text data matters: compressing the image with the original text/numeric zones included and keeping also a record of the text/numeric data and its positions in the image will have only a small compression penalty in exchange for an enormous improvement in accuracy of the alphanumeric data compared to compressing before applying OCR.

Tom

Phil Factor SSC-Insane Points: 20244 More actions · Answer 14

Xerox have found the bug which affects all compression levels. and are issuing a patch which seems to work. hundreds of thousands of devices are affected

http://realbusinessatxerox.blogs.xerox.com/2013/08/07/update-on-scanning-issue-software-patches-to-come/#.UhYrMz_px8E

It could be more than just a Xerox problem. For the latest news see

http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning#might_this_be_more_than_a_xerox_problem

Best wishes,
Phil Factor