- The Azure Data Lake has been renamed to the Azure Data Lake Store. The…
If you have on-prem data and want to copy it to Azure Blob Storage in the cloud, what are all the possible ways to do it? There are many, and here is a quick review of them:
AzCopy: A popular command-line utility designed for high-performance uploading, downloading, and copying… Read more
I see a lot of confusion about the place and purpose of the many new database solutions (“NoSQL databases”) compared to the relational databases solutions that have been around for many years. So let me try to explain the differences and best use cases for each.
First lets clarify these… Read more
In my Introduction to Hadoop I talked about the basics of Hadoop. In this post, I wanted to cover some of the more common Hadoop technologies and tools and show how they work together, in addition to showing how they work well with Microsoft technologies and tools. So you don’t… Read more
The Analytics Platform System (APS), which is a renaming of the Parallel Data Warehouse (PDW), has just released an appliance update (AU4), which is sort of like a service pack, except that it includes many new features. Below is what is new in this release:
AU4 continues to… Read more
Yesterday at the Microsoft World Wide Partner Conference in Orlando Microsoft announced the Cortana Analytics Suite, which is a new package of data storage, information management, machine learning, and business intelligence software in a single convenient monthly subscription. Microsoft’s Cortana personal digital assistant, until now available to consumers on mobile… Read more
Just announced is the Microsoft Azure Data Catalog, which is an enterprise metadata catalog / portal for the self-service discovery of data sources. It becomes available on Monday next week, July 13, 2015. Check out this short video on it. My response to this is – woo hoo! I have… Read more
Polyglot Persistence is a fancy term to mean that when storing data, it is best to use multiple data storage technologies, chosen based upon the way data is being used by individual applications or components of a single application. Different kinds of data are best dealt with different data stores.… Read more
Microsoft Azure Stream Analytics (ASA) is a fully managed cloud service for real-time processing of streaming data. ASA makes it easy to set up real-time analytic computations on data flowing in from devices, sensors, web sites, applications and infrastructure systems. It supports a powerful high-level SQL-like language that dramatically simplifies… Read more
Massive parallel processing (MPP) is the future for data warehousing.
So what is MPP? SQL Server is a Symmetric Multiprocessing (SMP) solution, which essentially means it uses one server. MPP provides scalability and query performance by running independent servers in parallel. That is the quick definition. For more… Read more
SQL Server 2016 was recently announced. Top new features include:
- Always Encrypted protects data at rest and in motion. With Always Encrypted, SQL Server can perform operations on encrypted data and best of all, the encryption key resides with the application in the customers trusted environment. Encryption and decryption of…
At the recent Microsoft Build Developer Conference, Executive Vice President Scott Guthrie announced the Azure Data Lake. It is a new flavor of Azure Storage which can handle streaming data (low latency, high volume, short updates), is geo-distributed, data-locality aware and allows individual files to be sized at… Read more
Analytics Platform System (APS) is Microsoft’s massively parallel processing (MPP) data warehouse technology. This has only been available as an on-prem solution (see video Overview of Microsoft Analytics Platform System). Until now. At the recent Microsoft Build Developer Conference, Executive Vice President Scott Guthrie announced the… Read more
In case you were wondering what happened to the TechEd conferences, Microsoft is now bringing together the best of previously individual events – the Management Summit, the Exchange, SharePoint, Lync, Project, and TechEd conferences – and then taking it to the next level, based on what customers and partners have… Read more
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. NoSQL is often interpreted as Not-only-SQL to emphasize that they may also support SQL-like query languages. Most NoSQL databases are designed to store… Read more
A “data lake” is a storage repository, usually in Hadoop, that holds a vast amount of raw data in its native format until it is needed. It’s a great place for investigating, exploring, experimenting, and refining data, in addition to archiving data. There are various products that you can use… Read more
Microsoft Azure provides you two options when hosting your SQL Server-based data warehouse: Microsoft Azure SQL Database and SQL Server in Azure Virtual Machine. Which one is appropriate based on the size of the data warehouse? What are some hardware features to choose from for an Azure VM for… Read more
I see a lot of confusion on what exactly is an Operational Data Store (ODS). While it can mean different things to different people, I’ll explain what I see as the most common definition. First let me mention that an ODS is not a data warehouse or data mart. A… Read more
Thanks to everyone who attended my session “Building a Big Data Solution” (Building an Effective Data Warehouse Architecture with Hadoop, Cloud and MPP) for Pragmatic Works today. The abstract for my session is below and the recording will be available here tomorrow. I hope you enjoyed it!
Here is the… Read more
In an effort to understand Power BI and all the products it encompasses, I have made this slide deck to hopefully make things easy for you: Power BI Made Simple.
It is a presentation that covers all the products under the Power BI umbrella. I give an overview of… Read more