Mozilla hoping to open source voice samples for future AI devs
Prying open speech recognition
Mozilla has decided speech recognition should be open source, and has launched a project to achieve just that, Project Common Voice.
What the browser builder wants, it says, is an open source data set for voice recognition apps.
The open source community, Mozilla's Daniel Kessler writes, is the “next wave of innovators” – but with speech datasets locked up behind proprietary walls, they're left out.
That also skews speech recognition to the most lucrative markets (English, Chinese and “a select group of languages”), whereas Mozilla hopes enough participants will let speakers of less-common languages talk to their browsers.
And that's where the open data-gathering comes in: if you're interested, the Project Common Voice site lets users record their own voice (reading sentences to the system, starting for now with English), or review how accurately the software recognises other speakers.
(Vulture South's observation is that the page works better in Firefox than in Chrome – surprise! – and that naturally enough, you have to give the page permission to use your microphone.)
Ultimately the company wants to gather 10,000 hours of recordings for release in Q4 of this year. Presumably, once developers and researchers have their hands on the initial sample, the project will move on to other languages. ®
Sponsored: Beyond the Data Frontier