But then on the other hand, think about how much already existing content that exists in text form we can leverage given we have a natural sounding voice reading it to us. Many e-learning platforms are already using it but to be honest most of them do not cut it, when they use TTS. It's diffent to watch a youtube tutorial with an energetic tutor that grabs my attention.
But technology keep catching on: WaveNet by Google DeepMind is promising, generating voices from actual audio samples. Imagine: Hearing your voice reading a book or a tutorial, without reading it (yes I know it's akward to hear you own voice when you are not used to it).
Based in deep learning techniques WaveNet picks up subtle notions such as breathing rhythm and individual intonation. Probably energizing the generated TTS with some markup is not so far away...

0 comments:
Post a Comment