Jeff Moden wrote:
Do you have a short example or suggestion of when data profiling can withstand a per element tolerance of +/- 2 percent other than polls?
Quite honestly, I can't be arsed to produce a specific example. However, it's pretty much standard procedure when devloping new integrations (or at least it should be) to do rough profiling in order to use relevant design patterns. Does this file type usually contain 100, 10.000 or 1.000.000 different customer id:s? That will potentially affect your approach, but +- 10 percent isn't that important.
I do see it more as a development tool, rather than an analysis tool for the data scientist, but if the performance gains are substantial enough, it could very well be means to pinpoint areas that merit more detailed analysis.
Much like wood worker can use both the 16-tooth circular saw and the Japanese pull saw, every tool has its place.
Just because you're right doesn't mean everybody else is wrong.