• Eric M Russell - Friday, April 27, 2018 3:35 PM

    We should be focusing on better quality data and better managed data - not more! more! more! data.

    Amen!

    I worked on a personalisation project that took a cartesian product of product purchases against all product attributes to produce a score for each customer for each attribute.  The result was 37TB 
    We ran cluster analysis and determined that millions of customers formed 45 distinct clusters. Millions of combinations of product attributes could be reduced to less than 100,000 physically possible ones.
    That dataset wouldn't challenge MySQL let alone SQL Server or require a Big Data technology.
    Hadoop teaches you to think about problems in different ways. Mostly ways that don't require Hadoop