SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Azure Cognitive Services – Speech APIs

This week in my Azure Every Day posts, I’ve been focusing on Azure Cognitive Services and what these features can add to your applications and your organization. Today my focus is the Speech APIs, which you can use to convert spoken audio into text, use voice for verification, or add speaker recognition to your app.

There are 3 primary APIs available in the Speech stack:

1. Translator Speech API – With this you can add the ability to easily conduct real-time speech translation with a simple REST API call. So, if you have an app that needs to operate around the world and you need to translate from a native speaker’s language to the common language that you’re using or reverse that and use speech detects to translate your common language to the native speaker’s language, this does this for you. This is an automatic, real-time translation tool, so you could do a real-time translation on a video or live feed. Currently includes support for a variety of languages and is updated with more regularly.

Join us at Azure Data Week coming in October 2018

2. Bing Speech API – This gives you the ability to convert speech to text and back again to understand user intent.

3. Speaker Recognition API – Still in preview, this can be trained to recognize voice patterns and use speech to identify and verify individual speakers. There’s a cool example of this online where it has a group of presidents speaking and the API will recognize which president it is. This also gives you the opportunity to train something to your speech patterns to identify you, which you could use to add security to an application by asking for voice recognition.

There is also a Custom Speech Service (still in preview) which allows you to overcome speech recognition barriers like speaking style, background noise and vocabulary.

As more people are interacting by speaking into their mobile devices and such, these APIs that Microsoft has made available are a great way to make speech part of your advanced applications.


Steve Hughes is a Principal Consultant at Magenic. His area of expertise is in data and business intelligence architecture on the Microsoft SQL Server platform. He was also the data architect for a SaaS company which delivered a transportation management solution for fleets across the United States. Steve has co-authored two books and delivered more than 30 presentations on SQL Server and data architecture over the past six years. He also provides insights from the field on his blog at http://dataonwheels.wordpress.com.


Leave a comment on the original post [feedproxy.google.com, opens in a new window]

Loading comments...