Speech Recognition

macsen listens to the user and plays music
Speech recognition technologies automatically translate speech or spoken language into written text.

The technology is essential for creating accessible services, increasing digital inclusion, and underpins many modern services such as automatic subtitling, transcription systems and voice assistants.

macsen listens to the user and plays music

Types of Speech and Transcription

There are a variety of different types of speech and transcription, including:

  • Speech while reading – processing speech that reads pre-written text. It contains more formal language and correct vocabulary.
  • Spontaneous speech – processing speech from normal and natural conversations. It includes pausing, repeating words or restarting sentences and speaking less formally with occasional English words (‘code switching’)
  • Whole audio transcription – processing short or long audio recordings
  • Live transcription – convert speech to text as you speak
  • Verbatim transcription – recording every word, pause, and phrase exactly as they were spoken
  • Easy-to-read subtitle transcription – create subtitles from speech that are easier to follow
  • Meeting transcription – record multi-speaker conversations with the ability to recognize voices and label who is speaking

Speech Recognition Resources for Developers

Models

Our Welsh and English speech recognition models are available for developers to use in two ways:

  • Through the unit’s API center – for easy integration into your systems and services
  • Locally – by downloading from the Hugging Face website and running them on your own local server or computer:

Data

Our data collections for speech recognition are also readily available from the Hugging Face website.

Our Development Work

For several years now, we have been developing Welsh speech recognition resources of the best possible quality. This work has included collecting, creating and distributing a significant corpus of Welsh speech data in open source form. Our aim is to enable developers from all over to use the data to create and improve the Welsh speech recognition provision of their products and services, contributing to the promotion and support of the Welsh language.

As the data increases, we have used it to train and develop our own speech recognition models, following the exciting developments in the field. Our first step was using HTK (Hidden Markov Model Toolkit), before moving on to Kaldi and its more powerful methods. More recently, we have turned to training innovative neural network models such as wav2vec2 and Whisper. With each step, we saw significant improvements – not only in terms of accuracy, but also in terms of the ability to understand a wide variety of Welsh accents and deal with challenging audio situations.

When developing resources for the Welsh language, we are also aware of the need to be able to transcribe English well. Many Welsh speakers use both languages ​​in their daily lives, and it is important that our systems can deal with both languages ​​effectively – either separately or in conversations that change between the two languages.

Practical Applications with Speech Recognition

Our speech recognition models have been in use in real apps and services for several years:
  • Trawsgrifiwr – an online service that allows users to automatically transcribe their audio recordings
  • Macsen – Welsh voice assistant prototype package

Collaboration with Partners

This is not solitary work. We have worked closely with a number of other organizations and developers, including:

  • Mozilla Foundation through the Common Voice project
  • The Wales-Brittany AI-language central Network partners
  • Amazon and the Welsh Local Government Association (WLGA)
  • Many podcast producers and other speech audio producers who have contributed their data and given us permission to share it as transcribed by us

These collaborations have been essential for gathering diverse data, sharing expertise, and ensuring that our resources meet the real needs of users and organizations.

Looking to the Future

Our work continues to evolve as we embrace the exciting possibilities of emerging technologies. We are developing speech recognition systems that can translate directly from Welsh to English, combining speech recognition and machine translation in one step. In addition, we investigate other intellectual abilities such as language recognition, accent recognition, and register recognition – abilities which are essential for creating more sophisticated and context-sensitive systems.

We are also investigating large language models (LLMs) that can understand and process spoken instructions in Welsh, opening new doors for natural interfaces and Welsh voice assistants.

If you would like to know more about our work, or if you are interested in collaborating with us, we would love to hear from you. Contact us to discuss how we can work together to further develop Welsh speech technologies.