By using intelligent multimedia pattern recognition algorithms, the combination of Live Automatic Speech Recognition (Live ASR) and audio mining automatically generates a wide range of metadata for media files.

IAIS Audio Mining
With the Fraunhofer IAIS audio mining system, audio and video tracks can be searched specifically for original sounds. Speaker recognition makes it possible to find people and target them in the file.

© Photo Fraunhofer IAIS

Live Automatic Speech Recognition (ASR) – speech recognition in real time

The ASR technology from Fraunhofer IAIS enables the reliable conversion of spoken information into digital text in real time, even under difficult conditions such as background noise or regional dialects. This automatic speech recognition in real time not only promotes natural communication between people and machines, but also provides valuable support for people with hearing impairments.

Already in use, the software provides automatic live subtitling (transcription) of speeches in parliaments, for example. In industrial environments, it enables communication with machines using voice commands. The technology is characterized by its high reliability in speech recognition, offers excellent performance in German and English, is noise-resistant, adaptable for specific applications and vocabularies and provides word and phoneme output for downstream systems.

Developed by

Fraunhofer IAIS

Your contact person

I will be happy to provide you with information about our software products.

Ying Ge-Wolf
Ying Ge-WolfProduct sales

Audio Mining – Automatic processing of audio media stocks

With automatic speech recognition (“speech-to-text”), audio data can be prepared for searching and automatically tagged. It also recognizes different speakers and distinguishes speech from other audio data (music, sounds). The metadata of the audio files can be enriched accordingly to support existing search functions.

Benefit & value

Whether editing, hosting or archiving – use the possibilities of artificial intelligence (AI) for your media library. With Audio Mining, you can discover, save and reuse audiovisual media in 99 different languages in an innovative way. Intelligent multimedia pattern recognition algorithms automatically generate a variety of metadata for your media files and convert spoken word into searchable text. This allows you to retrieve media information such as terms, quotes, speakers or keywords quickly and easily, which significantly optimizes the management of your media library – without a great deal of effort.

Flexibility and usability

Thanks to its service-oriented architecture and message-based communication, the audio mining system offers a high degree of flexibility and the possibility to tailor the range of functions to your individual needs. This allows the system to be integrated into an existing media archive and used, for example, as a metadata enrichment service, or to function as a stand-alone media archive.

According to your requirements

For your version of the audio mining system, we can use existing workflows, e.g. for text mining or audio transcription, or we can develop new individual workflows for you. In close cooperation with your team, customer-specific AI models can be trained, new analysis services can be developed or additionally existing services can be connected.

Industries & areas of application

  • Radio and television stations

  • Media library provider

  • Organizations that want to discover metadata from large amounts of text, audio, and/or video information
  • Searchable archiving
  • Subtitle creation