RuTa – an editor for speech transcriptions

The 7th Conference

Human Language Technologies - the Baltic Perspective

Riga, Latvia
October 6-7, 2016

RuTa – an editor for speech transcriptions

Roberts DARĢIS , Artūrs ZNOTIŅŠ, Normunds GRŪZĪTIS and Ilze AUZIŅA

Institute of Mathematics and Computer Science, University of Latvia

RuTa is a speech transcription editor that allows to manually edit automatic speech recognition results. It is developed in AILab (The Institute of Mathematics and Computer Science, University of Latvia) based on accumulated experience from research projects about speech recognition and its application to media monitoring. RuTa demonstrates current developments for Latvian speech recognition and provides a simple user interface for transcription editing.

Workflow with RuTa:

1) audio file uploading,

2) automatic speech transcription and speaker segmentation,

3) manual result editing,

4) result exporting or sharing.

In beginning RuTa converts uploaded audio/video file to single channel WAV audio. The audio then is split into smaller segments based on speaker segmentation results and longer pauses. These segments are processed by multiple speech recognizers in parallel, so processing time is less than total length of the audio. Currently only general Latvian speech recognizer is available, but RuTa is not limited to a specific speech transcription technology and language. Potential applications includes transcriptions of interviews and public speeches, subtitling and others. Prototype is available in RuTa.ailab.lv.