Microsoft Ignite has always announced many new products and new product features, and this year was no exception. Many exciting announcements, and below I list the major data platform related announcements:
Usage-based optimization with Azure Synapse and Power BI: A new feature that analyzes usage patterns in Power BI and shares the information with Synapse is now available to improve query performances. Synapse automatically creates a materialized view optimized for Power BI users — greatly accelerating the speed of a query performance. More info
Apache Spark for Azure Synapse In-cluster Caching and Shuffle Service (Preview): Caching and shuffle are two of the components of infrastructure for Apache Spark that have the greatest impact on performance. These new services, which are written from scratch, allow the optimization of performance for these components on modern hardware and operating systems. The service is enabled for Apache Spark Pools in Azure Synapse today.
High throughput output from Stream Analytics to Azure Synapse Analytics (General Availability): ?Azure Stream Analytics jobs now have the ability to output to an existing SQL pool table in Azure Synapse Analytics, and can process throughput rates even higher than 200 MB/s. This will support the most demanding real-time analytics and hot-path data processing needs of Azure customers who depend on their Data Warehouse for workloads such as reporting and dashboarding.
Azure AD authentication features for Azure SQL DB, Azure Synapse Analytics, and Azure SQL Managed Instance: ?Three new features using Azure Active Directory (Azure AD) authentication are currently in preview for Azure SQL Database, Azure Synapse Analytics, and Azure Managed Instance. These features help automate user creation using Azure AD applications and allow individual Azure AD guest users to be created in SQL Database, Azure Synapse Analytics, or SQL Managed Instance. First, support is being added for Azure AD user creation in SQL Database and Azure Synapse Analytics on behalf of Azure AD applications. This functionality is useful in the Azure AD application automation processes where Azure AD objects are created and maintained in SQL Database and Azure Synapse Analytics without human interaction. Second, users can use cloud groups to manage Directory Readers role assignment in Azure AD. This change will promote ease of use when setting up SQL Database, Azure Synapse Analytics, and SQL Managed Instance to create users. Finally, Azure AD guest users can now be created as database users and set as Azure AD admin without the need to first add them as members of a group created in Azure AD. These features extend existing functionality, remove user limitations, and provide customers with greater ease of use when setting up the SQL Database, Azure Synapse Analytics, or SQL Managed Instance.
New Azure Cognitive Services and capabilities in preview: ??In the Decision category, we have the new Metrics Advisor service that proactively monitors metrics and diagnoses issues so organizations can protect their growth engine, from sales revenue to manufacturing. Built on top of Anomaly Detector, Metrics Advisor can be applied to business metric monitoring, AI Ops, and predictive maintenance scenarios. See an example in action. In the Vision category, the new spatial analysis feature in Computer Vision helps organizations maximize the value of their physical spaces by understanding peoples movements in near-real time – especially critical these days in helping organizations reopen safely by supporting social distancing and other health compliance measures. It comes available in a container to run at the edge. Here’s a walkthrough of how customers are applying it. Additionally for containers, the following are available in preview: for Speech there are automatic language detection and neural text-to-speech containers and for Computer Vision, the Read 3.0 and 3.1 containers for text extraction.
Announcing Azure SQL Edge now in general availability: ?Azure SQL Edge, which brings the most secure Microsoft SQL data engine to Internet of Things (IoT) gateways and edge devices, is now available. Optimized for IoT workloads, Azure SQL Edge supports built-in data streaming, storage, and artificial intelligence packed into a small footprint container that works in connected, disconnected, and semi-connected environments where local storage and analytics are important. You can reduce cost, drive innovation, and improve operational efficiencies by using Azure SQL Edge to stream, store, and analyze IoT data. You simply need to develop your application once and deploy it anywhere across the edge, your datacenter, and Azure.
Hosting catalog databases for all supported versions of SSRS in Azure SQL Managed Instance: Now in general availability, Azure SQL Managed Instance can host catalog databases for all supported versions of SQL Server Reporting Services (SSRS). Last year, SQL Server 2019 Reporting Services introduced native support for hosting catalog databases in SQL Managed Instance. Now you can also use SQL Managed Instance to host catalog databases for earlier supported versions of SSRS. This is especially useful for fast migration of the existing solutions to SQL Managed Instance, without the need to test and adopt the latest version of SSRS. It also helps you get quick business benefits and then modernize further at your own pace. To learn how to configure your SQL Managed Instance to host catalog databases for earlier supported versions of SSRS, visit the Tech Community blog.
Major performance improvements for Azure SQL Managed Instances: We are announcing a set of major performance improvements for Azure SQL Managed Instances, which enable you to migrate your more performance-hungry database workloads to Azure SQL Managed Instance. These improvements include better transaction log write throughput for general purpose and business critical instances and superior data/log IOPS for business critical instances. Increased performance predictability and stability for general purpose service tier through better integration and enhanced tempdb performance are also included. These improvements are automatically enabled for all existing and future Azure SQL Managed Instances at no extra charge, making Azure SQL Managed Instance the best fully-managed database platform for your mission-critical workloads. Learn more about major performance improvements for Azure SQL Managed Instances
Machine Learning on Azure SQL Managed Instance in preview: Machine Learning Services with support for R and Python languages now include preview support on Azure SQL Managed Instance. When using Machine Learning Services with R or Python support in Azure SQL Managed Instance, you can run R and Python scripts to do data preparation and general purpose data processing, train machine learning models in database, and deploy your models and scripts into production in stored procedures. Azure SQL Managed Instance
Configurable backup storage redundancy options for Azure SQL Database coming soon: By default, all backups taken on Azure SQL Database are stored on read-access geo-redundant storage (RA-GRS) blobs that are being replicated in paired regions, ensuring data is protected in case of planned and unplanned events. Preview of configurable backup storage redundancy options for Azure SQL DB is coming soon to South East Asia, with additional regions expected in the coming months. This feature will provide you with more flexibility and choice to pick between locally redundant storage (LRS), zone redundant storage (ZRS) or geo redundant (RA-GRS) storage blobs for their backups. As additional regions are launched in preview, there will be more backup storage redundancy options available to use to meet your data residency requirements or minimize backup costs. You will be able to easily configure backup storage redundancy in the Azure portal. For more information, visit our Tech Community blog.
Azure Synapse Link now supports Azure Cosmos DB API for Mongo DB (Preview): Azure Synapse Link for Azure Cosmos DB now supports API for Mongo DB in preview. You can analyze the data in Azure Cosmos DB API for Mongo DB collections, using Synapse Apache Spark or Synapse SQL serverless, without impacting the performance of your transactional workloads. Azure Synapse Link, announced earlier this year, creates a tight integration between Azure Cosmos DB and Azure Synapse Analytics. You can directly access your Azure Cosmos DB data from Azure Synapse Analytics and run near real-time business intelligence, analytics, and machine learning pipelines. This integration enables you to build cloud-native HTAP (Hybrid transactional/analytical processing) solutions to perform no-ETL (extract, transform, and load) analytical processing on operational data in Azure Cosmos DB.
Azure Synapse Link for Azure Cosmos DB: SQL serverless runtime support coming soon: ?Azure Synapse Link will soon have preview support for querying Azure Cosmos DB data with Synapse SQL serverless. In the coming months, you will be able to run analytical T-SQL queries over data in Azure Cosmos DB in place within seconds. These queries can be used to build rich near real-time dashboards using Power BI integration with Azure Synapse Analytics. This new feature enhances the capability of Azure Synapse Link and will give you a wide range of business intelligence and ad-hoc querying tools via the T-SQL interface. Azure Synapse Link, announced earlier this year, creates a tight integration between Azure Cosmos DB and Azure Synapse Analytics. This gives you direct access to Azure Cosmos DB data from Azure Synapse Analytics to run near real-time business intelligence, analytics, and machine learning pipelines. This integration enables you to build cloud-native HTAP (Hybrid transactional/analytical processing) solutions to perform no-ETL (extract, transform, and load) analytical processing on operational data in Azure Cosmos DB, without impacting the performance of your transactional workloads. Azure Synapse Analytics will soon support Azure Synapse Link with Synapse Apache Spark and Synapse SQL serverless.
Serverless offer for Azure Cosmos DB coming soon on all APIs: ?Azure Cosmos DB serverless is in preview for the Core (SQL) API and will soon be available in preview for all Azure Cosmos DB APIs, including MongoDB, Cassandra, Gremlin, and Table. This offers NoSQL developers a database operations option with zero capacity planning or management required. As a consumption-based option with no minimum, serverless could also significantly lower the entry price for applications. It is ideally suited for small and medium-sized workloads that do not require steady throughput, receive requests sporadically or in short bursts, and have moderate performance requirements. Learn more.
Enhanced management experience for Azure SQL Managed Instance: ?Management operations are an essential part of Azure SQL Managed Instance, covering create, update, and delete scenarios. Most of the management operations In SQL Managed Instance are long-running but did not have operation progress visible. Through the introduction of a new CRUD API version, the SQL Managed Instance resource is now visible from when the create request is submitted. In addition, the new OPERATIONS API adds the ability to monitor management operations, see operation steps, and take dependent actions based on operation progress. Learn More
Transactionally consistent database copy for Azure SQL Database Hyperscale tier: Now in preview, the Hyperscale service tier in Azure SQL Database supports generating a transactionally consistent database copy to either the same or different logical server or region, similar to database copy already supported in other service tiers. This enables scenarios where Hyperscale databases have to be copied for various reasons, such as development and testing. If a database copy is created within the same region, it is a fast operation regardless of data size, similar to a point in time restore in Hyperscale. A cross-region database copy is a slower size of data operation. If you require support for database copy in your workflow, you can now take advantage of unique Hyperscale capabilities such as support for very large databases, fast scaling, and multiple replicas. Learn how to create a database copy and visit our Hyperscale database copy blog for more details.
Hyperspace, an indexing subsystem for Apache Spark, is now open source: For Microsoft’s internal teams and external customers, we store datasets that span from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative ”finding the needle in a haystack” type of queries (e.g., point-lookups, summarization). Resorting to linear scans of these large datasets with huge clusters for every simple query is prohibitively expensive and not the top choice for many of our customers, who are constantly exploring ways to reducing their operational costs – incurring unchecked expenses are their worst nightmares. Over the years, we have seen a huge demand for bringing indexing capabilities that come de facto in the traditional database systems world into Apache Spark. Today, we are making this possible by releasing an indexing subsystem for Apache Spark called Hyperspace – the same technology that powers indexing within Azure Synapse Analytics.
New MERGE command for Azure Synapse Analytics: ?The new MERGE command in Azure Synapse Analytics allows customers to insert, update, or delete data from one table based on the differences discovered in another table. With the MERGE command, the previous multi-step process to synchronize two tables in a dedicated SQL pool is now streamlined into a single step, thus reducing the data operation cost and improving the user experience. Learn More
COPY command now generally available in Azure Synapse Analytics: The COPY command feature in Azure Synapse Analytics provides users a simple, flexible, and fast interface for high-throughput data ingestion for SQL workloads. With this announcement, we’ve added performance improvements with file splits, ease of use with Parquet auto-schema discovery and complex data type support, and the support of the COPY command within our data integration partner ecosystem including: Databricks, Informatica, Streamsets, Panoply, and Talend. Learn More
Column-level encryption for Azure Synapse Analytics: ?Column-level encryption (CLE) helps customers implement fine-grained protection of sensitive data within a table (server-side encryption). With CLE, customers gain the ability to use different protection keys for columns in tables with each key having its own access permissions. The data in CLE enforced columns is encrypted on disk (and remains encrypted in memory) until the DECRYPTBYKEY function is used to decrypt it. This feature applies to dedicated SQL pools in Azure Synapse Analytics. Learn More
Azure Databricks – Delta engine with Photon: Announced the preview of Photon powered Delta engine on Azure Databricks – fast, easy, and collaborative Analytics and AI service. Built from scratch in C++ and fully compatible with Spark APIs, Photon is a vectorized query engine that leverages modern CPU architecture along with Delta Lake to enhance Apache Spark 3.0’s performance by up to 20x. As organizations worldwide embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly analyze massive amounts and types of data. However, this has been a challenge. While storage and network performance have increased 10x, CPU processing speeds have only increased marginally. This leads to the question if CPUs have become the bottleneck, how can we achieve the next level of performance? The answer with Photon lies in greater parallelism of CPU processing at both the data-level and instruction-level. Photon powered Delta Engine is a 100% Apache Spark-compatible vectorized query engine designed to take advantage of modern CPU architecture for extremely fast parallel processing of data. Read the blog to learn more.
Azure Arc enabled data services: Announcing Azure data services anywhere, a new capability enabled by Azure Arc that allows customers to run Azure data services on-premises, multi-cloud and edge, using Kubernetes on the hardware of choice. Azure Arc enabled SQL Managed Instance and Azure Arc enabled PostgreSQL Hyperscale are available in preview now, with other services to come over time. Customers can now deploy a variety of data services, with access to latest innovation, cloud benefits like automation and elastic scale, unified management across all data assets in hybrid environments, and unique Azure security and governance capabilities like Azure Security Center, Policies and Role-based Access Control for on-premises data workloads. Best of all, customers can use modern cloud billing models on-premises for better cost efficiency. Learn more about Azure Arc enabled data services
Azure ML Ignite 2020 announcements: ?Azure Machine Learning Enterprise Edition (preview) capabilities are merging into Basic Edition on September 22, 2020. With these enhanced capabilities included in Basic Edition, the Enterprise Edition of Azure Machine Learning will retire on January 1, 2021.
- No-code drag and drop experience for building and deploying ML models
- Automatically build and deploy predictive models using the no-code UI or through a code-first notebooks experience.
ML Labeling Assist
- The machine learning assisted labeling feature lets users trigger automatic machine learning models to accelerate the labeling task.
Advanced RBAC (preview)
- AzureML operation level RBAC will allow customers to set custom roles, or re-use pre-built roles to control specific operations for the individual users in a workspace.
mlflow updates (preview)
- MLflow projects support for simplified job submission from local to cloud
- Mlflow model registry support
- Model deployment support (model from MLflow model registry and deploy it to ACI or AKS)
- Expanded support for MLflow experimentation UI
Workspace Private Link (preview)
A network isolation feature that enables you to access Azure ML over a private IP in your virtual network (VNet).
Azure SQL Database – Cross-SQL MI distributed Tx – Preview: The distributed database transactions spanning multiple Azure SQL Managed Instances will be added soon to enable frictionless migration of existing applications, as well as development of modern multi-tenant applications relying on vertically or horizontally partitioned database architecture. By utilizing distributed transactions once public preview is announced, customers will be able to save time when migrating existing applications that require this capability as it eliminates the need to change application code and to perform extensive testing. If customers develop new applications, they will be able to benefit from partitioning data into multiple databases to overcome current sizing limitations of Azure SQL Managed Instance while utilizing distributed transactions to keep partitioned data in a consistent state. Once preview is announced, two methods of running distributed transactions will be supported using BEGIN DISTRIBUTED TRANSACTION statement from Transact-SQL code and using TransactionScope class from .NET code.
Power BI announcements from Ignite:
- Power BI Premium will soon be available on a per-user basis?. Individuals will be able to upsize their per-seat Professional licenses to Premium. During the preview period, the upgrade carries no additional cost. See Answering your questions around the new Power BI Premium per user license
- Power BI Premium is getting faster, and is gaining an autoscale capability that, when enabled, allows Microsoft to provision additional “V-cores” (virtual CPU cores) to the customer’s cloud tenant for periods of 24 hours, when overloads are detected. The v-cores are automatically removed during idle periods?. See Announcing the Upcoming Evolution of Power BI Premium to enterprise markets and beyond
- Preview of “smart narratives,” an augmented analytics feature which provides plain-English summarizations of the data in a report, either on a per-visualization or page-wide basis. The narratives automatically update when data is filtered or drilled down upon, and the narratives are editable, both in terms of formatting and for insertion of arbitrary or calculation-driven text. See Smart Narrative (preview)
- Dedicated Power BI add-in application for Microsoft’s Teams collaboration platform, released as a preview. The Teams integration includes the ability to browse reports, dashboards and workspaces and directly embed links to them in Teams channel chats. It’s not just about linking though, as Teams users can also browse Power BI datasets, both through an alphabetical listing of them or by reviewing a palette of recommended ones. In both cases, datasets previously marked as Certified or Promoted will be identified as such, and Teams users will have the ability view their lineage, generate template-based reports on them, or just analyze their data in Excel. See Announcing: New Power BI experiences in Microsoft Teams
- Through a new preview feature, Power BI asset sensitivity levels, applied as Microsoft Information Protection (MIP) labels, can be applied from within Power BI Desktop, making it unnecessary to apply them to the report and its underlying dataset in the cloud service’s user interface
- Microsoft is enhancing its Power Query technology to feature a visual interface rather than relying on the data grid view that has been its hallmark. Essentially, Power Query gets a new Diagram View where all queries in the model, and each of the transformation steps within them, will appear together, in a flowchart-like representation. This visual presentation indicates the dependencies between queries and the sequence of individual transformations within each. New transforms can be easily added in the Diagram View as well. Visual data preparation is coming soon. It will be available initially in Power BI dataflows only (i.e. online), and not in Power BI Desktop. The capability will come to Desktop within the next calendar year
- A visual anomaly detection (coming soon)
- A new visual that can trigger Power Automate processes in a data-driven fashion (also coming soon)
- A performance accelerator for Azure Synapse Analytics that automatically creates materialized views to accelerate frequent queries from Power BI
- GA of deployment pipelines (including availability on government clouds) that can move Power BI assets between development, test and production environments. See Announcing deployment pipelines General Availability (GA)
- A preview of a data impact analysis feature that can notify users of datasets that may be affected by a change to another dataset upstream