SQLServerCentral Article

Cloudera Vs Hortonworks


Successful and leading edge organizations that want to leverage the potential of Big Data have made Hadoop their platform. This revolutionary open-source software framework can process big data sets by disseminating them across the commodity servers. Today, majority of Big Data enterprises resort to Apache Hadoop as their platform. In order to streamline functioning with Hadoop, there are enterprise versions available such as Hortonworks, MapR and Cloudera. 

However, as the demand for enterprise Hadoop expands exponentially with every passing day, the existing tension between two suppliers, i.e. Hortonworks and Cloudera, becomes acute. 


Cloudera was the very first organization to create and circulate Apace Hadoop oriented software. It offers a quick-fix Cloudera Management Suite that automates the installation procedure. Furthermore, this suite also offers various services for improving user convenience that comprises of minimizing execution time, showcasing the count of real-time nodes and many more. 


Hortonworks has emerged as a leading Hadoop vendor within a short span of time. The distribution offers an open source environment that’s based on Apache Hadoop for storing, evaluating and monitoring Big Data. Simply put, Hortonworks happens to be the one and only commercial vendor to issue a complete open source Apache Hadoop without any supplementary proprietary software. Today, companies and business houses can directly download the Hortonworks’ distribution, HDP2.0, from the website without paying any charge. It is easy to install as well. Some of the latest Hadoop innovations have been possible owing to the expertise of the Hortonworks engineers, such as Yarn, that scoresg much higher in performance than MapReduce by including increased amount of data processing frameworks. 

Recent Comparative Analysis

Both Hortonworks and Cloudera have much to compete for owing to their individual core competencies. For instance, there exists the concern of having financial support, which any pre-IPO start-ups are required to have in order to win over enterprise consumers. It seems that Cloudera has a thumbs-up in this fight, with its recent March 2014 declaration of $740m in funding from the chipmaker, Intel. With regards to consumer acquirement, Cloudera might have a tad bit advantage over Hortonworks. However, the crucial point of incongruity between Hortonworks and Cloudera is in their perspective to an important question that is of relevance to maximum enterprise customers, i.e. should Hadoop substitute or complement/support conventional EDW (Enterprise Data Warehouse) investments? 

Mulling on this concern, David McJannet, Vice-President of Marketing for Hortonworks, is of the opinion that Hadoop is a significant addition to the present-day analytic technologies. McJannet further asserts that “A unique aspect of our approach is the fact that we’re not trying to compete with the data warehousing incumbents. This is a pivotal philosophical difference we have with other folk in this market.” Adding force and competitive spirit to the discussion he further said that there is a supplier in the market that will recommend to throw Teradata and get it all on Hadoop, and that by all means will be Cloudera. According to McJannet, for Cloudera, data warehousing is dead. One can well understand the tug of war and constant performance comparison that exists between these two suppliers, despite each offering the best versions. 


The basic services provided to customers by both Hortonworks and Cloudera are the same, i.e. enterprise ready Hadoop along with higher stability, safety and adequate training for organizations that aren’t accustomed to this technology. Today, many have outlined the differentiating line as to how Cloudera and Hortonwork look at data warehouses, asserting that Hortonworks is willing to support the present data warehouse storage whereas Cloudera wants to cancel it completely. Interestingly, if we look at the way Cloudera proposed implementation of its Enterprise Data Hub, we will notice that the brand does integrate legacy warehouse storage. A more visible differentiation can be observed with regards to what technologies both the brands has to provide its customers. Hortonworks is an open-source classicist that utilizes technologies that can be open-sourced via the Apache Foundation. On the other hand, when customers pay for Cloudera, they pay for a complete set of open-source as well as proprietary components that includes the analytical SQL (Impala), machine-learning and in-memory processing (Apache Spark), data management (Cloudera Manager) and the online NoSQL (HBase).

Revenue Generated: In terms of revenue generated Hortonworks has been able to raise $225 million. Cloudera on the other hand has taken a clean sweep by raising money up to $900 million. Cloudera made a total of $740 million from its alliance with Intel recently.

Customer Base: Hortonworks has bagged 250 clients in the last 5 quarters. It has prominent names like eBay, Spotify, Samsung and Bloomberg in its kitty. Cloudera records 350 clients including big names like BT, MasterCard and Nokia. Surprisingly, eBay happens to be in the client list for both these brands.

Business Partners: The Hortonworks websites displays about 300 partners that boast names like Dell, HP and SAP. Cloudera takes a clean sweep again with a mention of 1000 partners that includes brands like IBM, HP, Intel and many more.

The Verdict

Cloudera is older than Hortonworks and is an established name. Both the brands have captured favourable market shares recently and are focusing on consolidated business development strategies. Therefore, it would be tough to declare either company as the winner. At present Cloudera is enjoying its benefits of starting earlier than Hortonworks. However, the potential that the later has exhibited so far, suggests the brand has all it takes to be an ace performer. One has to wait and watch the individual growth these two companies attain in the forthcoming years.

Author's Bio: Jenny Brown wrote number of blogs on data management. The blogs are listed on GreyCampus and DCS channel.


2.19 (26)

You rated this post out of 5. Change rating




2.19 (26)

You rated this post out of 5. Change rating