The goal for this project is to build a voice-to-voice CAI (Conversational AI) integration with an interactive 3D Avatar. Initially, it will integrate with SAP Conversational AI, which despite it’s name, is a text-to-text service. So, round trip looks like voice-to-text => CAI => text-to-voice, out to the user. Hopefully the performance isn’t terrible.
Here’s where I am so far on this project…
So far, I have built a conversational feedback loop. When you ask a question, a tiny client-side voice to text library transcribes your question into text. If that text matches any predefined questions, a cached voice response plays back in response. Secondly, if you start any phrase with, “Simon says..” it will repeat that phrase back to you. But this time, it pulls the response from Google’s text-to-speech API, which needed a custom service to wrap up the API key and make requests from the Google servers.
Today I’m working on the 3D character. Here’s what he’ll look like.