Just how good is AI at automated translation, subtitling and voiceover?
AI automated translation and speech-to-text is such a a hot topic that tech giants are investing heavily in it, announcing every so often that they have beaten the best word error rate.
We teamed up with French start-up Mediawen because we wanted to see for ourselves how well AI performs at these tasks. By combining state of the art solutions from key players in the field (Google, Microsoft and IBM) with the best research-led solutions and its own algorithms, Mediawen is improving results.
We tested six videos:
- Two lab communication videos, using: 2 different sound capture settings (office/conference) Translation from French & English (using a French speaker for both)
- Two knowledge management e-learning capsules (Mediawen’s SCORM-compliant solutions are often used on such videos)
- Two in-game videos by GRW (Ghost Recon WildLands): one cinematic sequence and one gameplay sequence.
Speech-to-text was applied to all six videos, and automated translation to all but the GRW gameplay sequence, in which the mix of sounds, music and voices meant that AI performed poorly - confirming our hunch that each sound file should be individually translated. For both speech-to-text and translation, we tested tools by Google, Microsoft, IBM and Voxolab, picking the best-performing solution in each case.
This varied depending on whether it was speech-to-text or translation, and on direction of translation. Next, a human translator corrected the AI automated solution. Mediawen displays all corrections made, showing their type; some results required a lot more human intervention than others.
In a spirit of complete transparency, all the videos we processed are accessible here. Click on the thumbnail below, then start the video. The planet icon in the video player brings up a menu allowing you to cycle through the various speech-to-text, translation and voiceover tests.