SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

PASS 2015 Session Report – Understanding Real World Big Data Scenarios

PASS 2015 continues in Seattle, and today was my session at 1045am on Using Azure Machine Learning (ML) to Predict Seattle House Prices.  The background and info on my session is here http://www.sqlpass.org/summit/2015/Sessions/Details.aspx?sid=7794

Overall I was pretty happy with how it went - and I think everyone who attended had a lot of fun with some of the games and tests I injected into the presentation.  Everyone had a chance to be a Real Estate Agent :) - and at the same time learn some great methods around performing Azure ML Regression Predictive Analytics.

 

BUT – moving right along – I also attended 3 other sessions today, again I cannot blog about all of them in the time I have, but the one which made me think the most about technology implementations and how they can improves lives was Understanding Real World Big Data Scenarios by Nishant Thacker of Microsoft.

It wasn’t about use cases for big data (as this is a horse already bolted), but more around really innovative and interesting ways the ecosystem of Azure technologies could be deployed to solve some complex business problems, or moreso simply ways to make our lives better!

Key Takeaways

  • This session was as much about ideas as it was about technology – and I think the key driver for this session for me was that the Microsoft strategy around cloud and big data is well targeted, well defined, and well considered.  Its an exciting place to be!
  • On a straw poll, 10% of the room had actively implemented a Big Data solution (of some sort/flavour) while 50% are in the process of actively learning about what Big Data could do for their business.
  • He introduced the Azure Data Lake suite, which he made clear was NOT hosted on the Azure Blob Store (which is where HDInsight is currently hosted) but is instead hosted on a seperate high performance storage subsystem specifically tailored to analytics (or more specifically Azure Data Lake Analytics).
  • Here is some info on the Azure Data Lake Store https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-overview/ and on the Azure Data Lake Analytics https://azure.microsoft.com/en-us/documentation/articles/data-lake-analytics-overview/
  • Longer term it seems that Azure HDInsight (Hadoop) will have the option to run either on Azure Blob Store (WASB://) as it does now or direct/natively on Azure Data Lake (ADL://).  I suspect that the ADL will be a more expensive storage option given the performance levels, and so would only be selected for HDInsight clusters in special circumstances?  Or is it that ALL HDInsight storage will eventually end up in the Azure Data Lake given that HDInsight (clusters) are a fundamental part of Azure Data Lake? Yes, well, still to be verified!
  • All of the “real world” scenarios that were painted were absolutely real and exist – and all of them leveraged the new Azure Data Lake in some way, in addition to several other key elements of the Azure ecosystem, such as Azure Machine Learning (ML), Azure SQL Database, Azure SQL Data Warehouse and Event Hubs – in fact all of these components were in all solutions in some way.  Its impressive to see such easy integration in such a massively scalable platform.
  • The scenario that I drew many parallels with the most (for various reasons!) was the Fraud Detection scenario.  It made specific reference to collecting whatever data you have now (such as live transnational data, crossed historical pattern data, crossed with customer data, crossed with social data etc) with from wherever you can get it to make predictions if a certain set of transactions is fraudulent or not given the circumstances and historical patterns – and then taking immediate action accordingly (as opposed to batch action later).
  • It also made clear that despite so much data being collected, the end game was never to delete it, but instead draw on it to make future predictions better. (which to me makes perfect sense, though you would sometimes consider this possible not practical – and there would instead be a dimensionality reduction in the data before longer term storage?)

Overall quite a good session that positioned 3-4 relevant use cases along with the technology implementations for those scenarios.


Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here


Filed under: Azure, Azure Machine Learning (ML), Big Data, Data Lake, Data Warehousing, HDInsight, PASS Tagged: SQL Server

Mr. Fox SQL

Rolf Tesmer works as an Azure Data Solution Architect (DSA) in Australia for Microsoft. Rolf has an MCSE in Data Management & Analytics, an MCSE in Data Platform and an MCSE in Business Intelligence (BI). Rolf has been working with the SQL data platform since v6.0 (that’s 1994!) and has done just about everything you can around data related platforms, solutions and architectures ever since then and has scoped, designed and delivered 100’s of data solutions in that time. Rolf has had the opportunity to present extensively at Ignite, PASS, TechEd, SQL Saturday, SQL User Groups, MeetUp’s and Seminars, Roadshows, etc and really enjoys sharing and learning new ideas.

Comments

Leave a comment on the original post [mrfoxsql.wordpress.com, opens in a new window]

Loading comments...