I did the machine learning online course (http://bit.ly/2P3mOPZ) and want to know whether there is a good API for "voice recognition" and "text to speech" in C++. I have gone through the Festival, which you can't even say whether the computer is talking because it is so real and voce as well.
Unfortunately Festival seems not supporting voice recognition (I mean "Voice to Text") and voice is built in Java and it is a mess in C++ because of JNI.
The API should support both "Text to voice" and "Voice to Text", and it should have a good set of examples, at least outside the owner's website. Perfect if it has a facility to identify the set of given voices, but that is optional, so no worries.
What I am going to do with the API is, when a set of voice commands given, turn the robot device left, right, etc. And also, speak to me saying "Good Morning", "Good Night" etc. These words will be coded in the program.
Please help me to find a good C++ voice API for this purpose. If you have access to a tutorial/installation tutorial, please be kind enough to share it with me as well.
it would be difficult to find an ideal solution, since most voice to text engines function on the cloud and real humans actually listen to some of the voice clips and use that to improve the engine (but take that with a grain of salt since I am not an expert), so if you want a native solution in C++, you may need to pay up some cash just to use that tool (and sacrifice some privacy if that is a concern).
The most robust solution with the most minimal expense from my point of view is to use Alexa, but I am not sure how robust the the sdk is like. But sadly alexa doesn't try to identify voices from my knowledge. You could install alexa on your pc, and on a raspberry pi, but I don't know how well it works. And hosting an alexa skill is free (not just for building, but also for distributing), but it may be possible there may be a hidden expense for a very specific context.
I should warn you that C++ may not be a possibility (or convenient) with alexa's "skills" sdk. I think you would want to use node.js or python because I would imagine there are more tutorials for them. You need to implement a frontend using a GUI where you define words and phrases to be sent into the backend, which is lambda or node.js or whatever. there may be limitations, and annoyances, and complexities, like for example you are dealing with alexa, you can't avoid the fact that starting the application will require you to start off with "alexa". also I am pretty sure that to connect amazon to a native application (an application in C++) is the trickiest part, and you will need to dive into examples, and maybe ask in a forum to figure how to do that.
Also festival isn't a very good tts system, it is from the 90's (maybe with minor improvements over the decades). I don't know why you refer to it as realistic since all the examples I have heard sounded awful.