Rabimba Karanjai is a full time graduate researcher, part time hacker and FOSS enthusiast. He is working with Mozilla Research Mixed Reality team on WebVR. He also is a Mozilla TechSpeaker and would love to chat with you on VR,AR,Security and openweb over a cup of coffee or bottle of beer
Speeches di Rabimba Karanjai
DeepSpeech: A Journey to <10% Word Error Rate TTS
Deep Speech is an end-to-end trainable, character-level, deep recurrent neural network (RNN). In less buzzwordy terms: its a deep neural network with recurrent layers that gets audio features as input and outputs characters directly??the transcription of the audio. It can be trained using supervised learning from scratch, without any external sources of intelligence, like a grapheme to phoneme converter or forced alignment on the input.
One of the major goals from the beginning was to achieve a Word Error Rate in the transcriptions of under 10%. And now our word error rate on LibriSpeechs test-clean set is 6.5%, which not only achieves our initial goal, but gets us close to human level performance.
In this talk we will cover our journey on how we started the project, what models we evaluated, tuned. The design choices for the project and how we achieved near Human Accuracy.
We will cover technical details along with code demonstration and how you can use the engine, along with the models we trained or create and train your own model.