Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Data Science Sanity Checks Expand / Collapse
Author
Message
Posted Saturday, May 11, 2013 1:54 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, September 12, 2014 2:20 AM
Points: 587, Visits: 2,527
Comments posted to this topic are about the item Data Science Sanity Checks


Best wishes,

Phil Factor
Simple Talk
Post #1451814
Posted Monday, May 13, 2013 6:20 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 11, 2014 1:55 AM
Points: 162, Visits: 835
This was a very thought provoking editorial. I'm still cogitating but immediately...

I see the value of having someone whose job it is to check these things out and know the gritty details of the data. I think this is a crucial role, but I'd encourage everyone involved in the production, storage, and use of the data to try and understand it at a fine level also.
That being said, I'm not sold on the idea that there can be sufficient resources to identify all trends emerging within the data and then investigating them before the rest of the business pick up on them - the other option is the question of 'releasing' data to folk and that seems like an impediment to work, and still couldn't guarantee success.
Post #1452069
Posted Monday, May 13, 2013 7:39 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 11:29 AM
Points: 158, Visits: 1,800
Remember that investment management is about risk, not return. In contemplating what opportunities to pursue you have to understand what the risks are you are taking, and then acknowledge whether you were right or not.
Post #1452100
Posted Monday, May 13, 2013 7:49 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, August 28, 2014 1:35 PM
Points: 113, Visits: 423
The big problem with this is that business is about making money and sometimes there is money to be made even from bad data. Take the recent twitter hack on the AP news. Businesses that reacted quickly could short sell the stock market and make a bundle off this bad data even though it was absolutely off base and a quick check could show that.
Post #1452102
Posted Monday, May 13, 2013 8:22 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, September 20, 2013 2:44 PM
Points: 91, Visits: 31
Awesome subject Phil, thanks. I relate it to driving a car, you have to keep you eyes on the road, the mirrors the dashboard/control panel (throw in some breakfast and a smart phone too for added troubles/difficulties) and there's your business system. Crazy drivers (the other drivers naturally) to deal with, road hazards/accidents, etc. it's a constant barrage and requires constant vigilance of monitoring input and output to get where you need to go.
Post #1452128
Posted Monday, May 13, 2013 9:49 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Tuesday, September 2, 2014 8:37 AM
Points: 751, Visits: 1,917
One of the real things that must be checked when confronted by an anomalous item is: is it an artifact? The way data is collected, the questions asked, the context of its selection can sometimes cause very subtle distortions, not noticeable in the small scale but visible when trying to pull signal out of a lot of noise.

At times (often) the researcher doesn't have access to the context of the original data acquisition and there is plenty of room for serious errors.


...

-- FORTRAN manual for Xerox Computers --
Post #1452183
Posted Monday, May 13, 2013 12:34 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, June 17, 2013 9:49 AM
Points: 3, Visits: 18
So the whole business should stop seeing crucial data that supports their daily decision-making, and wait for a data scientist to sanitise the data (however long it takes)?

The fact is the end users of those data are the domain experts who can tell what is rogue data and what is real trend better than anybody else, including the data scientist who has generic data knowledge but not necessarily the domain knowledge.

IMHO, we should just give the data to the business, and give them the tools that highlights abnormal trends and help them do the analysis. That way you don't stop them seeing the data, but also help them identify rogue data.
Post #1452270
Posted Monday, May 13, 2013 12:42 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, August 28, 2014 1:35 PM
Points: 113, Visits: 423
I agree with Charles. At some point we have to trust our users.
Post #1452273
Posted Monday, May 13, 2013 1:18 PM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Tuesday, September 2, 2014 8:37 AM
Points: 751, Visits: 1,917
charles.wong (5/13/2013)
So the whole business should stop seeing crucial data that supports their daily decision-making, and wait for a data scientist to sanitise the data (however long it takes)?
...


The point was caution. Especially with market research and other non dollars and cents determinations.

Rogue data and artifact are not the same thing. Polling organizations (at least the good ones) have learned the pitfalls of categorization. The majority of potential customers may say they prefer prodcut B, but unless you know what A,C, and D were, or if other options were missing from the list (the old 'have you stopped beating your wife?' conumdrum), you don't know what they would actually buy. Even priming questions, that is seemingly unrelated questions asked before the choice have been proven to make a big difference in the answers givern.

Disastrous business and political decisions have been made by not understanding the data. When dealing with large amounts of data from disparate and uncontrolled sources, the risk is higher. By all means listen to those close to the issue, but remember that everyone, including those close to the issue can unintentionally bring in their own preferences and biases (remeber 'new Coke'?)


...

-- FORTRAN manual for Xerox Computers --
Post #1452288
Posted Monday, May 13, 2013 1:29 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 11:29 AM
Points: 158, Visits: 1,800
charles.wong (5/13/2013)
So the whole business should stop seeing crucial data that supports their daily decision-making, and wait for a data scientist to sanitise the data (however long it takes)?

The fact is the end users of those data are the domain experts who can tell what is rogue data and what is real trend better than anybody else, including the data scientist who has generic data knowledge but not necessarily the domain knowledge.

IMHO, we should just give the data to the business, and give them the tools that highlights abnormal trends and help them do the analysis. That way you don't stop them seeing the data, but also help them identify rogue data.


That really expresses where the efforts of data managers should be expended. Thanks for saying it so well
Post #1452293
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse