Since the beginning of the information age, enterprises have relied on processing data to gain insights and make business decisions. In the past decade, significant advances in technology have fostered the dramatic increase of data complexity in volume, variety, and velocity – the ‘three V’s’ of Big Data.
For example, 9 billion devices are connected to the Internet today– from laptop and desktop computers to tablets, smartphones, even home entertainment systems and thermostats. By 2020, that number is expected to be 50 billion. With such a dramatic increase in connected devices, a significant increase in data generation and processing will naturally follow.
This massive increase in data workloads has led to advances in new parallel processing frameworks that use distributed computing on distributed data sets, such as MapReduce on Hadoop. Hadoop, in turn, has seen rapid adoption because it is much cheaper than traditional mainframe proprietary storage solutions.