Big Data Workshop


A challenge I have with customers who want to get hands-on experience with the Azure products that are found in a modern data warehouse architecture is finding a workshop that covers many of those products. To the rescue is a workshop created by my Microsoft colleagues Fabio Braga and Rod Colledge, explained in their blog post Azure Data Platform End2End with the GitHub located here.

The idea of this workshop is to give experienced BI professionals (but new to Azure) a view of the variety of data services available and the role they play in the overall architecture. Most professionals never had a chance before to use a Spark cluster, or a NoSQL database, so the workshop aims to fill this gap. It’s true that similar outcomes can also be achieve with other services/features (this workshop uses only a subset of a much larger family of Azure services), but there is only so much that can be covered in a 2-day workshop. So keep in mind the architecture used in this workshop is only one of many possibilities for building a modern data warehouse solution. The lab will be updated as new products and features are released (i.e. ADF Mapping Data Flow when it GA’s).

A description of the workshop:

In this 2-day workshop you will learn about the main concepts related to advanced analytics and Big Data processing and how Azure Data Services can be used to implement a modern data warehouse architecture.  You will understand what Azure services you can leverage to establish a solid data platform to quickly ingest, process and visualize data from a large variety of data sources.  The reference architecture you will build as part of this exercise has been proven to give you the flexibility and scalability to grow and handle large volumes of data and keep an optimal level of performance.  In the exercises in this lab you will build data pipelines using data related to New York City.  The workshop was designed to progressively implement an extended modern data platform architecture starting from a traditional relational data pipeline.  Then we introduce big data scenarios with large files and distributed computing.  We add non-structured data and AI into the mix and finish with real-time streaming analytics.  You will have done all of that by the end of the workshop.  The workshop include a series of five labs with a discussion of concepts in-between each lab.

Technologies you

will use: SQL Data Warehouse, SQL Server in a VM, Azure Data Factory,

Databricks (w/Spark), Cognitive Services (w/computer vision), Event Hub, Stream

Analytics, PolyBase, Power BI, Blob Storage, Cosmos DB, Logic App

Original post (opens in new tab)
View comments in original post (opens in new tab)