I was building a test application to authenticate users via
Microsoft's Cognitive Speaker Recognition API. It seems straightforward, but as mentioned in their API Docs, while creating the Enrollment, I need to send the
byte of the audio file I record. Now, since I am using Xamarin.Android, I was able to record the audio and save it. Now, the requirements of THAT audio is pretty specific by
Microsoft's Cognitive Speaker Recognition API.
According to the API docs, the audio file format must meet the following requirements.
Container -> WAV Encoding -> PCM Rate -> 16K Sample Format -> 16 bit Channels -> Mono
Following this recipe I successfully recorded the audio and after playing around a little and with some android docs, I was able to implement these settings as well :
_recorder.SetOutputFormat(OutputFormat.ThreeGpp); _recorder.SetAudioChannels(1); _recorder.SetAudioSamplingRate(16); _recorder.SetAudioEncodingBitRate(16000); _recorder.SetAudioEncoder((AudioEncoder) Encoding.Pcm16bit);
This meets most of the criteria of the required audio file. But, I cannot seem to save the file in actual “.wav” format and I cannot verify whether the file is actually being
PCM encoded or not.
Here’s my AXML and MainActivity.cs : Github Gist
The file’s specs look just fine, but the duration is wrong. No matter how long I record, it just shows 250ms, which results in too-short audio.
Is there any way to do this? Basically I just want to be able to connect to
Microsoft's Cognitive Speaker Recognition API via Xamarin.Android. I couldn’t find any such resource to guide myself.