More Azure Blob Storage enhancements


I recently blogged about Query Acceleration for ADLS, which also applies to Azure Blob storage. Now there are more new features for blog storage that I will talk about.

Blob index preview: Recently announced in preview, blob index is a managed secondary index that allows you to store multi-dimensional object attributes to describe your data objects for Azure Blob storage. This allows you to categorize and find data based on attribute tags set on the data. Cool! To populate the blob index, you define key-value tag attributes on your data, either on new data during upload or on existing data already in your storage account. These blob index tags are stored alongside your underlying blob data. The blob indexing engine then automatically reads the new tags, indexes them, and exposes them to a user-queryable blob index. Blob Index not only helps you categorize, manage, and find your blob data but also provides integrations with other Blob service features, such as Lifecycle management, allowing you to move data to cooler tiers or delete data based on the tags applied to your blobs.

The below scenario is an example of how Blob Index works:

  1. In a storage account container with a million blobs, a user uploads a new blob “B2” with the following blob index tags: < Status = Unprocessed, Quality = 8K, Source = RAW >
  2. The blob and its blob index tags are persisted to the storage account and the account indexing engine exposes the new blob index shortly after
  3. Later on, an encoding application wants to find all unprocessed media files that are at least 4K resolution quality. It issues a FindBlobs API call to find all blobs that match the following criteria: < Status = Unprocessed AND Quality >= 4K AND Status == RAW>
  4. The blob index quickly returns just blob “B2,” the sole blob out of one million blobs that matches the specified criteria. The encoding application can quickly start its processing job, saving idle compute time and money
Blob Index overview example.

It will eventually work for ADLS Gen2. There is no cost for the indexing engine. For more info including signing up for the preview, see Manage and find data on Azure Blob Storage with Blob Index.

Geo-Zone-Redundant Storage (GZRS): GZRS and Read-Access Geo-Zone-Redundant Storage (RA-GZRS) are now generally available. GZRS writes three copies of your data synchronously across multiple Azure Availability zones, similar to Zone redundant storage (ZRS), providing you continued read and write access even if a datacenter or availability zone is unavailable. In addition, GZRS asynchronously replicates your data to the secondary geo pair region to protect against regional unavailability. RA-GZRS exposes a read endpoint on this secondary replica allowing you to read data in the event of primary region unavailability. To learn more, see Azure Storage redundancy.

Account failover: Customer-initiated storage account failover is now generally available, allowing you to determine when to initiate a failover instead of waiting for Microsoft to do so.  When you perform a failover, the secondary replica of the storage account becomes the new primary. The DNS records for all storage service endpoints—blob, file, queue, and table—are updated to point to this new primary. Once the failover is complete, clients will automatically begin reading from and writing to data to the storage account in the new primary region, with no code changes. Customer initiated failover is available for GRS, RA-GRS, GZRS and RA-GZRS accounts.  To learn more, see Disaster recovery and account failover

Versioning preview: Versioning automatically maintains prior versions of an object and identifies them with version IDs. You can restore a prior version of a blob to recover your data if it is erroneously modified or deleted. A version captures a committed blob state at a given point in time. When versioning is enabled for a storage account, Azure Storage automatically creates a new version of a blob each time that blob is modified or deleted. Versioning and soft delete work together to provide you with optimal data protection.  To learn more, see Blob versioning.

Point in time restore preview: Point in time restore for Azure Blob Storage provides storage account administrators the ability to restore a subset of containers or blobs within a storage account to a previous state. This can be done by an administrator to a specific past date and time in the event of an application corrupting data, a user inadvertently deleting contents, or a test run of a machine learning model. Point in time restore makes use of Blob Change feed, currently in preview. Change feed enables recording of all blob creation, modification, and deletion operations that occur in your storage account. To learn more, see Point in time restore.

Routing preferences preview: Configure a routing preference to direct network traffic for the default public endpoint of your Storage account using the Microsoft global network or using the public internet. Optimize for premium network performance by using the Microsoft global network, which delivers low-latency path selection with high reliability and routes traffic through the point-of-presence closest to the client. Alternatively, route traffic through the point-of-presence closest to your storage account to lower network costs and minimize traversal over the Microsoft global network. Routing configuration options for your Storage account also enable you to publish additional route-specific endpoints. Use these new public endpoints to override the routing preference specified for the default public endpoint by expliciting routing traffic over a desired path. Learn more.

Object replication preview: Object replication is a new capability for block blobs that lets you asynchronously replicate your data from your blob container in one storage account to another anywhere in Azure. Object replication unblocks a new set of common replication scenarios:

  • Minimize latency – have your users consume the data locally rather than issuing cross-region read requests
  • Increase efficiency – have your compute clusters process the same set of objects locally in different regions
  • Optimize data distribution – have your data consolidated in a single location for processing/analytics and then distribute only resulting dashboards to your offices worldwide
  • Minimize cost – tier down your data to Archive upon replication completion using lifecycle management policies to minimize the cost

Please refer to Object Replication documentation for more details.

More info:

Azure Blob Storage enhancing data protection and recovery capabilities

Original post (opens in new tab)
View comments in original post (opens in new tab)