Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Business Intelligence and Enterprise Architecture

Derek Wilson delivers tactical and strategic Business Intelligence and Enterprise Architecture solutions. His primary focus in on Microsoft SQL Server technologies and aligning business problems to technology solutions. He architects BI solutions leveraging SQL Server, SharePoint and any other technologies that help his clients achieve better data driven decisions. By leveraging the information learned while collecting requirements for BI projects, he helps align business processes to technology helping further organizations Enterprise Architecture. He is an author, trainer, blogger and has been using SQL Server since version 6.5.

Languages of Big Data Apache Hadoop

Languages of Big Data

Big Data is here and rapidly growing.  If you are just starting to learn about Big Data then it is important for you to understand all of the various pieces that can be used in Big Data architectures.  The Apache foundation is a major player in the Big Data space.  Each heading below is a hyperlink to the homepage of each project.  Enjoy learning about all Apache Hadoop and related technologies have to offer.

Apache Hadoop

Taken from the Hadoop homepage – “The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. ”

  • Hadoop Distributed File System (HDFS): – Distributed file system for  high-throughput access to application data
  • Hadoop YARN – A job scheduling and cluster resource management framework
  • Hadoop MapReduce – Based on YARN to provide for parallel processing of large data sets.
  • Hadoop HIVE – data warehouse infrastructure that provides data summarization and ad hoc querying
  • Hadoop Mahout – Machine Learning and Data Mining framework
  • Hadoop PIG -  high-level data-flow language and execution framework for parallel computation

Apache Sqoop

Sqoop is a tool designed to load data from relational databases into Hadoop

Apache Flume

Flume is used for log file data to be collected and aggregated.  It has a simple and flexible architecture based on streaming data flows that can allow online analytics.

Apache Solr

Solr is open source enterprise search platform from the Apache Lucene project.  It allows full text search and near real-time indexing, dynamic clustering, database integration, and will index rich documents such as Word or PDF.

MongoDB, SOLR /Lucene/Elastic Search, NoSQL, Hadoop

MapReduce, Hive, Hbase, Pig, Mahout, Avro, Oozie

Google+

The post Languages of Big Data Apache Hadoop appeared first on Derek Wilson - Blog.

Comments

Leave a comment on the original post [derekewilson.com, opens in a new window]

Loading comments...