VK Cloud logo
Updated at October 31, 2023   06:09 AM

Speech recognition

There are two types of speech recognition available:

  • recognition of audio files;
  • recognition of streaming audio.

Recognize audio files

To recognize speech from an audio file, send the audio file in the body of the POST request to https://voice.mcs.mail.ru/asr, specifying the correct Content -Type in header.

Request example:

curl -L --request POST 'https://voice.mcs.mail.ru/asr'--header 'Content-Type: audio/ogg; codecs=opus'--header 'Authorization: Bearer xxxxxxxxxx'--data-binary '@/Users/User/tts.ogg'

Answer example:

{	"qid": "0ac6294a351d42ad859404ecd349e4b9",	"result": {		"texts": [			{				"text": "hello alice",				"confidence": 1.0,				"punctuated_text": "Hello, Alice."			}		],		"phrase_id": "20220921-1515-4d75-92b4-24b6c101ba6a"	}}

Supported audio formats

container
Codec
content-type
WAV
audio/wave
ogg
Opus
audio/ogg codecs=opus

Restrictions

Restriction
Meaning
Maximum audio file size
20 Mb
Maximum audio duration
5 min
Maximum number of channels
1

Error codes

Code
Status
Description
4009
400
Audio size too big
4033
400
Unknown audio format
4034
400
Audio is corrupted or in an unexpected format
4043
400
Too long audio
4044
400
Unsupported number of audio channels
4045
400
Unsupported audio sample rate
4048
400
Invalid token
4049
400
Inactive project VK Cloud

Recognize streaming audio

To recognize a chunk (small piece of speech), you need to send a request to create a task. After that, it will be possible to send chunks and receive the final result.

Request to create a task

In order to create a task, it is enough to send a POST request to https://voice.mcs.mail.ru/asr_stream/create_task with an authorization header with access_token, the response will be task_id, task_token.

Request example:

curl --request POST \  --url https://voice.mcs.mail.ru/asr_stream/create_task  --header 'Authorization: Bearer access_tokenxxxxxxxx'

Answer example:

{  "qid": "61b5cf067c494b4a9a0b87a3c43e37ef",  "result": {    "task_id": "05ad987e-ceee-4186-acdb-956148b91692",    "task_token": "040b2fcfc3d9b9806b691070e873125dfc0450a8251887ba91b19be08eb3951c"  }}

Request to send a chunk

A chunk is an audio fragment of the selected format, respectively, headers must be present in each chunk.

To send a chunk, all you need to do is:

  • send a POST request to https://voice.mcs.mail.ru/asr_stream/add_chunk, passing in the Authorization-task_token header;
  • pass task_id and chunk_num in GET parameters (numbering starts from 0);
  • specify the correct Content-Type in the request header.
  • a chunk is sent in the request body, which is an array of bytes in wav or ogg format.

The response will be the result of chunk recognition.

Request example:

curl --request POST \  --url 'https://voice.mcs.mail.ru/asr_stream/add_chunk?task_id=xxxxx&chunk_num=2' \  --header 'Authorization: Bearer task_tokenxxxxxxxx' \  --header 'Content-Type: audio/wave' \  --data 'xxxxxxxxxx'

Answer example:

{  "qid": "4d44cb0eb81f4e7f84a7997ec4f2f3c4",  "result": {    "text": "hello marusya"  }}

Supported audio formats

Container
Codek
Content type
WAV
audio/wave
ogg
Opus
audio/ogg codecs=opus

Restrictions

Restriction
Meaning
Maximum chunk size
32100 B
Maximum chunk duration
1 s
Maximum number of channels
1
Minimum number of chunks
5

Request to get the end result of the task

At any time after sending chunks, you can get the result, for this you need to send a GET request to https://voice.mcs.mail.ru/asr_stream/get_result, passing Authorization-task_token in the header, task_id in the GET parameters.

The response will include the recognition result with the current status of the task.

Request example:

curl --request GET \  --url 'https://voice.mcs.mail.ru/asr_stream/get_result?task_id=xxxxx' \  --header 'Authorization: Bearer task_tokenxxxxxxxx' \

Answer example:

{  "qid": "517e5ba9f4a9465c9d73778bedac0808",  "result": {    "text": "hello marusya hello marusya",    "status": "done"  }}