Article

A Timing Determination Method for the Insertion of Automated Audio Descriptions into Live TV Sports Programs

We are conducting research on automated audio descriptions (AADs) to help visually impaired people enjoy live TV programs. This technology uses real-time data (who, when, what is done, etc.) that has been automatically or manually generated from a sports match. The data are turned into audio descriptions by a voice synthesizer, and these are then distributed simultaneously with the broadcast audio. However, a problem occurs when AADs overlap with the live commentary, as the listener is put in the difficult position of having to listen to both the commentaries and the AADs at the same time. Therefore, overlaps need to be prevented in order for the sports program to be understood. In this work, we propose a timing determination method to insert AADs into live sports programs. The proposed method predicts the end of each utterance in a commentary, and AADs are then inserted after the commentaries have finished. In this method, the difference between the long- and short-term moving average of the fundamental frequency (F0) is utilized to predict the end of utterances. Visually impaired people evaluated the ease of listening to both commentaries and AADs and indicated that our method makes it easier for them to listen.

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.