Data Has a Dollar Value

  • Steve Jones - SSC Editor - Monday, August 20, 2018 12:44 PM

    meta data about your data. So in the ML world, you might have data that attached to other data. Say we have all these avatar pictures here at SSC. There's a name attached to each, which can be the label. This helps a system (or person) trying to learn how to identify a person.

    You can do this for all sorts  of data. You might have a set of purchases from a user. You might label different items that are repeat purchases, or one-off purchases. Those would be labels used later for additional analysis.

    These are really the ways we decide how to view data. Often these are attributes we have in databases, but they might also be new types of data we think about adding.

    Ah, thank you, Steve, for the explanation.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Eric M Russell - Tuesday, August 21, 2018 7:35 AM

    Matt Miller (4) - Monday, August 20, 2018 1:50 PM

    Let me see if I get this:
    1.we want to use mass analytics tools, but
    2. we don't have enough data for the analysis to be statistically significant.  so
    3. we MAKE UP the data to have enough volume for the mass analytics to work.

    Anyone else see the problem with this approach? 
    You STILL don't have enough data, and frankly you've now tilted the model with fake data.  How is that defensible?  I'm not sure who the data scientist is that is signing off on these, but that really sounds suspect.

    Sentiment analysis on social media feeds is perhaps the most dubious, over-hyped, and ultimately useless form of analytics. Bots, paid promoters, fake accounts, and a feedback loop effect created by digital marketing companies has turned social media into an echo chamber.

    Agreed - I was going to mention as an example earlier one of those new services that have been running a LOT of TV and Youtube commercials, where they will help new companies set up turnkey websites "complete with customer stories and testimonials, feedback, etc..." 

    In the old days we USED to call that false advertising and misrepresentation.  Today it's a "service" :crazy:

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Steve Jones - SSC Editor - Tuesday, August 21, 2018 9:11 AM

    Absolutely, which is where the advanced researchers are working and to avoid linear analysis and too simple a view. Of course, they can overthink things as well, which is why this field is tough to work in. Extrapolation is very hard, and unless you are working on a narrow question, we often can't predict well with any accuracy for individuals. However, you often can get better with groups, as long as you always work with groups and don't get caught up on individuals

    Heh... shifting gears a bit but still related to the subject....

    I wish all the "geniuses" out there that are trying to make predictions about stuff would learn that, if you don't have enough data to do a proper study, extrapolating the data is going to lead you to the wrong conclusion because you've already come to the wrong conclusion as to how to extrapolate the data. 😀  They also need to learn a thing or two about identifying limits, losses, inaccuracies, biased data, and making sure that the units of measure are actually compatible.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's not necessarily about extrapolation in the sense of predicting the future. That happens, but outside of guessing where large groups of items might land (sales, locations, etc.) it's not that successful. And certainly too many people are doing it that aren't experts in risk and statistics.

    There is extrapolation in a difference sense, for example in medicine. One of the hard things, and why House MD was a great show, is that it's hard to know everything and keep it in mind. However, AIs can do a better job here, like the people that have perfect memory. The training here is something like if we have this XRay (MRI, etc.), then this is likely to be cancer, this isn't. The training here is hard, and really, we're just training the system to do what humans do in a very narrow space. Without the issues of memory loss, stress and pressure, fatigue, etc. The AI should do better, but it takes lots of data to do this. This also isnt' what I'd call intelligence, but it's closer to highly specialized simulation of what humans do. Useful, but isn't sentience in any sense.

    There are companies that can mock some of this data, and help train the system. Of course, the system has to validate on real cases where multiple radiologists verify this, but that's separate. The data need is high and researchers or companies building systems don't necessarily go a great job with data.

    Companies that can build realistic data here, accounting for a variety of ways that a condition would manifest itself are helpful. This can help train the model and allow it to extrapolate a bit if the density or placement of the item on a scan varies. The AI can then flag a likelihood of an issue. Some human likely needs to review this, but these systems are levers. They dramatically amplify what one expert (ish) human does.

    These are places where synthetic and natural data can be used.

  • Jeff Moden - Wednesday, August 22, 2018 11:26 AM

    Heh... shifting gears a bit but still related to the subject....

    I wish all the "geniuses" out there that are trying to make predictions about stuff would learn that, if you don't have enough data to do a proper study, extrapolating the data is going to lead you to the wrong conclusion because you've already come to the wrong conclusion as to how to extrapolate the data. 😀  They also need to learn a thing or two about identifying limits, losses, inaccuracies, biased data, and making sure that the units of measure are actually compatible.

    All the hype over analytics today reminds me of the mid - late 90's when every company was hustling to hire web developers. In the early 2000s it was IPOs and online trading. Most of us remember what happened next...
    Social media, IoT, and BigData analytics will eventually grow up and become something useful, but today it's still all about data mining twitter feeds, spam bots, and "smart" sex toys.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Let's not forget that often times, there is enough data somewhere. It just does not exist in the current data systems the organization has. Sometimes it's like pulling teeth to get an expansion on the existing data for the mere fact of how hard it is to justify the overhead of adding it to the already restricted data warehouse.

  • Steve Jones - SSC Editor - Wednesday, August 22, 2018 12:35 PM

    It's not necessarily about extrapolation in the sense of predicting the future. That happens, but outside of guessing where large groups of items might land (sales, locations, etc.) it's not that successful. And certainly too many people are doing it that aren't experts in risk and statistics.

    There is extrapolation in a difference sense, for example in medicine. One of the hard things, and why House MD was a great show, is that it's hard to know everything and keep it in mind. However, AIs can do a better job here, like the people that have perfect memory. The training here is something like if we have this XRay (MRI, etc.), then this is likely to be cancer, this isn't. The training here is hard, and really, we're just training the system to do what humans do in a very narrow space. Without the issues of memory loss, stress and pressure, fatigue, etc. The AI should do better, but it takes lots of data to do this. This also isnt' what I'd call intelligence, but it's closer to highly specialized simulation of what humans do. Useful, but isn't sentience in any sense.

    There are companies that can mock some of this data, and help train the system. Of course, the system has to validate on real cases where multiple radiologists verify this, but that's separate. The data need is high and researchers or companies building systems don't necessarily go a great job with data.

    Companies that can build realistic data here, accounting for a variety of ways that a condition would manifest itself are helpful. This can help train the model and allow it to extrapolate a bit if the density or placement of the item on a scan varies. The AI can then flag a likelihood of an issue. Some human likely needs to review this, but these systems are levers. They dramatically amplify what one expert (ish) human does.

    These are places where synthetic and natural data can be used.

    Yes, these are conditions where synthetic dta can be used: but only to devalue or to destroy or distort the genuine factual evidence.  If you tell your AI to believe invented "realistic" data you are introducing a bias that has no grounding in reality, only in your choice of what you want be be believed to be real.

    Tom

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply