SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Innovation Needs Information


Innovation Needs Information

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)

Group: Administrators
Points: 470964 Visits: 20599
Comments posted to this topic are about the item Innovation Needs Information

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
chrisn-585491
chrisn-585491
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10796 Visits: 2896
However, there is also lots of data inside companies, especially some of the big social media and communication companies that gives them an advantage. I think that's OK, after all, these companies have innovated to build large scale enterprises and devoted resources to collecting data.


This isn't OK. I think all this data collected by social media and other giants is currently abused. We don't have GDPR in the US and the motivations of social media companies aren't always aligned with the overall public interest. When you can use these resources and ML to identify or isolate groups of individuals whose values or belief systems don't match yours or reward those who does or manipulate opinion, then the collection of data becomes very problematic. This is the East German Stasi times infinity and the problem will get much uglier in the near future with the current political divisions.

However, overall we share lots of information with others. I do know that many other communities are catching up and I really appreciate the answers I've gotten from others when trying to repair my tractor or auto.


On a lighter note, the sheer amount of good quality information available in the last 10 years is amazing. Knowledge and skill transfer through web sites and videos is extraordinary and one of the finest examples of an information society. This could help fix the broken education system if applied right.

jay-h
jay-h
SSChampion
SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)

Group: General Forum Members
Points: 11516 Visits: 2700
There are a number of flaws in the starry eyed optimism in that article.

Of course there are the serious privacy threats (as addressed above). Recently Australia has been working on a project to open source anonomized medical data. After the first trial batch was posted, researchers showed how easy it was to de-anonomize it with a few crosschecks.

But it goes beyond that. It assumes that data will be of consistent quality, of known origin ... not likely the case for data gathered and 'given away' by a variety of different sources (sources who have a vested interest in keeping some of their best data). Even small biasing factors in the sourcing of data can have significant effects on the statistical validity. Good research requires that all those effects be accounted for and balanced before conclusions are drawn.

There is NO magic in crazy large data. Data size has an asymptotic effect on results, after a certain point, just piling on more data does not necessarily introduce more insights, indeed it may include confounding factors.

What we consider machine learning (or even more incorrectly called 'artificial intelligence') is essentially statistical pattern matching. It does not 'know' whether the sunrise causes the rooster to crow or the rooster causes the sunrise. It has no comprehension of creating a theory and then creating tests to confirm or disprove that theory. Machine learning cannot look at a correlation and conclude that there must be an additional factor because the result does not 'make sense' in its current interpretation.

We look at some of the successes that statistical analysis has had, and some extrapolate that into a vast future.. with 10x the data we'll have 10x the knowledge.
It doesn't work that way.

...

-- FORTRAN manual for Xerox Computers --
Steve Jones
Steve Jones
SSC Guru
SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)SSC Guru (470K reputation)

Group: Administrators
Points: 470964 Visits: 20599
This doesn't assume that data is of a certain quality. In fact, many people that analyze data assume data is broken, which is why any analysis work, ML based or other, needs to spend more of its time cleaning and trying to work through problematic data. Only those companies that are very naive or starting out think otherwise.

There are other problems when trying to generalize too often from a large data set and apply that to individuals. Again, that's often a misapplication of data to a problem, which is either too general or ill defined.

It can be scary or problematic for any one large scale collection of data, but that's entirely separate from the potential to actually innovate new applications using the data.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Eric M Russell
Eric M Russell
SSC Guru
SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)

Group: General Forum Members
Points: 86418 Visits: 13697
Innovation should solve a problem or add value to an existing solution. Sometimes engineers and marketers forget that.

This story is perhaps the most cringe-worthy of all IT security fails:

https://nakedsecurity.sophos.com/2016/09/20/maker-of-smart-vibrator-sued-for-snooping-on-customers-use/
https://thenextweb.com/gadgets/2016/08/10/hackers-can-remotely-activate-vibrator-find-often-use/


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
jay-h
jay-h
SSChampion
SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)

Group: General Forum Members
Points: 11516 Visits: 2700
When the information is second or third hand, there is no way to know the quality in most cases. You can take three or four political surveys, for example, but the questions were asked in different ways, in different contexts of other questions, with different sample selections and on different days when different headlines were in the news. All those characteristics are significant, but the user of the data has no control over them.

Medical data is another example. Treatment providers need to categorize patient conditions and treatment into a large range of fixed categories, none of which may accurately reflect the full reality. Further distortion occurs because choosing the wrong checkbox can mean difficulty in collecting from insurance so physicians tend to follow certain patterns (I'm not talking fraud here, but when more than one choice can legitimately apply, there is a tendency to choose the most pragmatic).

...

-- FORTRAN manual for Xerox Computers --
Jeff Mlakar
Jeff Mlakar
Say Hey Kid
Say Hey Kid (679 reputation)Say Hey Kid (679 reputation)Say Hey Kid (679 reputation)Say Hey Kid (679 reputation)Say Hey Kid (679 reputation)Say Hey Kid (679 reputation)Say Hey Kid (679 reputation)Say Hey Kid (679 reputation)

Group: General Forum Members
Points: 679 Visits: 346
I agree with the overall willingness to share information in the SQL Server community. It is rare. I do see it's evil twins in action as well: RTFM responses to "do my homework for me" questions. There's benefits to both however I prefer the SQL Server community approach better most of the time.

Innovation is tricky - the benefit can be "better, faster, cheaper" or it can be abused. I wish data privacy was taken more seriously with the collection of data by organizations.
Eric M Russell
Eric M Russell
SSC Guru
SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)SSC Guru (86K reputation)

Group: General Forum Members
Points: 86418 Visits: 13697
As database administrators and developers, it's our job to keep the innovation bandwagon headed in the right direction. We should not simply jump on the back of the bandwagon. There are a lot of smart professionals who allowed themselves to get mixed up in bad stuff because they passively accepted the organizational status quo of hoarding and abusing data. Whistleblowers in government is great, but we also need whistleblowers in corporate and startup IT shops as well.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search