Innovation Needs Information

Question

Innovation Needs Information

Steve Jones - SSC Editor

SSC Guru

Points: 734418
More actions
January 20, 2018 at 1:51 pm

#327213

Comments posted to this topic are about the item Innovation Needs Information

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

chrisn-585491 SSCoach Points: 16006 More actions · Answer 1

However, there is also lots of data inside companies, especially some of the big social media and communication companies that gives them an advantage. I think that's OK, after all, these companies have innovated to build large scale enterprises and devoted resources to collecting data.

This isn't OK. I think all this data collected by social media and other giants is currently abused. We don't have GDPR in the US and the motivations of social media companies aren't always aligned with the overall public interest. When you can use these resources and ML to identify or isolate groups of individuals whose values or belief systems don't match yours or reward those who does or manipulate opinion, then the collection of data becomes very problematic. This is the East German Stasi times infinity and the problem will get much uglier in the near future with the current political divisions.

However, overall we share lots of information with others. I do know that many other communities are catching up and I really appreciate the answers I've gotten from others when trying to repair my tractor or auto.

On a lighter note, the sheer amount of good quality information available in the last 10 years is amazing. Knowledge and skill transfer through web sites and videos is extraordinary and one of the finest examples of an information society. This could help fix the broken education system if applied right.

jay-h SSCoach Points: 18816 More actions · Answer 2

There are a number of flaws in the starry eyed optimism in that article.

Of course there are the serious privacy threats (as addressed above). Recently Australia has been working on a project to open source anonomized medical data. After the first trial batch was posted, researchers showed how easy it was to de-anonomize it with a few crosschecks.

But it goes beyond that. It assumes that data will be of consistent quality, of known origin ... not likely the case for data gathered and 'given away' by a variety of different sources (sources who have a vested interest in keeping some of their best data). Even small biasing factors in the sourcing of data can have significant effects on the statistical validity. Good research requires that all those effects be accounted for and balanced before conclusions are drawn.

There is NO magic in crazy large data. Data size has an asymptotic effect on results, after a certain point, just piling on more data does not necessarily introduce more insights, indeed it may include confounding factors.

What we consider machine learning (or even more incorrectly called 'artificial intelligence') is essentially statistical pattern matching. It does not 'know' whether the sunrise causes the rooster to crow or the rooster causes the sunrise. It has no comprehension of creating a theory and then creating tests to confirm or disprove that theory. Machine learning cannot look at a correlation and conclude that there must be an additional factor because the result does not 'make sense' in its current interpretation.

We look at some of the successes that statistical analysis has had, and some extrapolate that into a vast future.. with 10x the data we'll have 10x the knowledge.
It doesn't work that way.

...

-- FORTRAN manual for Xerox Computers --

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 3

This doesn't assume that data is of a certain quality. In fact, many people that analyze data assume data is broken, which is why any analysis work, ML based or other, needs to spend more of its time cleaning and trying to work through problematic data. Only those companies that are very naive or starting out think otherwise.

There are other problems when trying to generalize too often from a large data set and apply that to individuals. Again, that's often a misapplication of data to a problem, which is either too general or ill defined.

It can be scary or problematic for any one large scale collection of data, but that's entirely separate from the potential to actually innovate new applications using the data.

Eric M Russell SSC Guru Points: 125519 More actions · Answer 4

Innovation should solve a problem or add value to an existing solution. Sometimes engineers and marketers forget that.

This story is perhaps the most cringe-worthy of all IT security fails:

https://nakedsecurity.sophos.com/2016/09/20/maker-of-smart-vibrator-sued-for-snooping-on-customers-use/
https://thenextweb.com/gadgets/2016/08/10/hackers-can-remotely-activate-vibrator-find-often-use/

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

jay-h SSCoach Points: 18816 More actions · Answer 5

When the information is second or third hand, there is no way to know the quality in most cases. You can take three or four political surveys, for example, but the questions were asked in different ways, in different contexts of other questions, with different sample selections and on different days when different headlines were in the news. All those characteristics are significant, but the user of the data has no control over them.

Medical data is another example. Treatment providers need to categorize patient conditions and treatment into a large range of fixed categories, none of which may accurately reflect the full reality. Further distortion occurs because choosing the wrong checkbox can mean difficulty in collecting from insurance so physicians tend to follow certain patterns (I'm not talking fraud here, but when more than one choice can legitimately apply, there is a tendency to choose the most pragmatic).

...

-- FORTRAN manual for Xerox Computers --

Jeff Mlakar SSCrazy Points: 2879 More actions · Answer 6

I agree with the overall willingness to share information in the SQL Server community. It is rare. I do see it's evil twins in action as well: RTFM responses to "do my homework for me" questions. There's benefits to both however I prefer the SQL Server community approach better most of the time.

Innovation is tricky - the benefit can be "better, faster, cheaper" or it can be abused. I wish data privacy was taken more seriously with the collection of data by organizations.

Eric M Russell SSC Guru Points: 125519 More actions · Answer 7

As database administrators and developers, it's our job to keep the innovation bandwagon headed in the right direction. We should not simply jump on the back of the bandwagon. There are a lot of smart professionals who allowed themselves to get mixed up in bad stuff because they passively accepted the organizational status quo of hoarding and abusing data. Whistleblowers in government is great, but we also need whistleblowers in corporate and startup IT shops as well.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho