SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Microsoft Products vs Hadoop/OSS Products

Microsoft’s end goal is for Azure to become the best cloud platform for customers to run their data workloads.  This means Microsoft will provide customers the best environment to run their big data/Hadoop as well as a place where Microsoft can offer services with our unique point-of-view.  Specific decision points on using Hadoop is if the customer wants to use open source technologies or not.  Some of the benefits of running open source software (OSS) on Azure include:

  • Quick installs
  • Support
  • Easy scale
  • Products work together
  • Don’t need to get your own hardware

To determine the cost savings by moving your OSS to Azure, see the Total Cost of Ownership (TCO) Calculator.

Of course there are many benefits of using Microsoft products over OSS, such as ease of use, support, better security, easier to find people with skills, less frequent version updates, more stable (less bugs), more compatibility and integration between products, etc.  But there are still reasons to use OSS (i.e. cost, faster performance in some cases, more product selection and features), so I created a list that shows many of the Microsoft products and their equivalent, or close equivalent, Hadoop/OSS product.

I tried to list only Apache products unless there was no equivalent Apache product or there is a really popular Open Source Software (OSS) product.

Microsoft Product Hadoop/Open Source Software Product
Office365/Excel OpenOffice/Calc
DocumentDB MongoDB, MarkLogic, HBase, Cassandra
SQL Database SQLite, MySQL, PostgreSQL, MariaDB
Azure Data Lake Analytics/YARN None
Azure VM/IaaS OpenStack
Blob Storage HDFS, Ceph (Note: These are distributed file systems and Blob storage is not distributed)
Azure HBase Apache HBase (Azure HBase is a service wrapped around Apache HBase), Apache Trafodion
Event Hub Apache Kafka
Azure Stream Analytics Apache Storm, Apache Spark, Twitter Heron
Power BI Apache Zeppelin, Apache Jupyter, Airbnb Caravel, Kibana
HDInsight Hortonworks (pay), Cloudera (pay), MapR (pay)
Azure ML Apache Mahout, Apache Spark MLib
Microsoft R Open R
SQL Data Warehouse Apache Hive, Apache Drill, Presto
IoT Hub Apache NiFi
Azure Data Factory Apache Falcon, Airbnb Airflow
Azure Data Lake Storage/WebHDFS HDFS Ozone
Azure Analysis Services/SSAS Apache Kylin, AtScale (pay)
SQL Server Reporting Services None
Hadoop Indexes Jethro Data (pay)
Azure Data Catalog Apache Atlas
PolyBase Apache Drill
Azure Search Apache Solr, Apache ElasticSearch (Azure Search build on ES)
Others Apache Flink, Apache Ambari, Apache Ranger, Apache Knox

Many of the Hadoop/OSS products are available in Azure.  If you feel I’m missing some products from this list, please let me know as this is very subjective and comments are always welcome!

James Serra's Blog

James is a big data and data warehousing technology specialist at Microsoft. He is a thought leader in the use and application of Big Data technologies, including MPP solutions involving hybrid technologies of relational data, Hadoop, and private and public cloud. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 30 years of IT experience. James is a popular blogger (JamesSerra.com) and speaker, having presented at dozens of PASS events including the PASS Business Analytics conference and the PASS Summit. He is the author of the book “Reporting with Microsoft SQL Server 2012”. He received a Bachelor of Science degree in Computer Engineering from the University of Nevada-Las Vegas.


Leave a comment on the original post [feedproxy.google.com, opens in a new window]

Loading comments...