Macsen

Macsen app on a phone

Macsen is an open source Welsh language voice assistant similar to Alexa or the Google Assistant.

Open source means that anyone can see, adapt and distribute its code as they wish. It works as an app for phones and tablets, and is available for iOS and Android devices. An online version of Macsen is also available. You can speak in Welsh to Macsen to ask it to do various tasks or provide information.

Macsen’s skills

Erbyn hyn, mae gan Macsen nifer o sgiliau, gan gynnwys y gallu i chwarae cerddoriaeth Cymraeg ar Spotify, rhaglenni ar S4C Clic, rhoi’r golau ymlaen neu i ffwrdd, adrodd y newyddion diweddaraf o benawdau Golwg360 a darparu rhagolygon y tywydd o wefan OpenWeatherMap. Mae Macsen hefyd yn defnyddio ChatGPT, cyfieithu lleferydd Saesneg i destun Cymraeg a thrawsgrifio lleferydd Cymraeg i destun. Felly mae modd teipio yn ogystal â defnyddio’r llais i ofyn cwestiynau a rhoi gorchmynion i Macsen.

Assistant

The latest developments in the field of artificial intelligence have transformed the field, and caused us to reconsider what is possible for a computer to achieve. Macsen now uses the ChatGPT-4 Language Model to answer questions, reason and converse through the medium of Welsh.

Translation

Macsen‘s translation skill enables users to translate English speech into Welsh text. The technology has now also been linked to The Termiadur Addysg dictionary, which enables Macsen to list terms related to speech. Other resources created to facilitate translation are available, such as the aligner and our resource for sharing translation memories.

Transcription

Macsen‘s transcription skill enables users to transcribe any Welsh speech into text. After speaking your message, you can copy the text into any app, whether it’s a text message, an email or a shopping list! As well as being a skill in Macsen, Trawsgrifiwr is available as an online version. A Windows version of Trawsgrifiwr is also available.

Download Macsen

Be part of the Welsh language’s digital revolution and download the Macsen app for iOS or Android devices today!

Language technologies within Macsen

Macsen uses a number of different technologies to operate. It uses Mozilla DeepSpeech speech recognition to translate what you say into text. Then it uses intent parsing to recognise whether you asked for the news, weather, music or one of the other options. When Macsen needs to reply orally, it uses text-to-speech technology to speak the the appropriate response.

We are still improving the speech features, and if you would like, you can help us improve it in the future by contributing recordings of your voice. You can do this by clicking on Training within the app. This will guide you to read specific sentences aloud in the app. We will use these recordings to create development sets and test sets for training the speech recognition. If you want to contribute more than this, visit Mozilla’s CommonVoice website to record sentences for the large collection of recordings. More information about these technologies and the Welsh language is available in the Language Technologies Handbook published by the Coleg Cymraeg Cenedlaethol.

Macsen is funded by the Welsh Government, and we thank them and the volunteers who have been contributing their voices to improve speech technology.

Thanks also to Golwg360 and OpenWeatherMap for permission to use their online services.

Open source resources for developers

We are using this project to show what we can create when developing Welsh language speech technology and artificial intelligence and all relevant components and resources are shared below under an open source license:

techiaith/macsen-flutter

techiaith/macsen-sgwrsfot

Documentation showing how Macsen can be used to expand a digital service is available within the app code: https://github.com/techiaith/macsen-flutter/blob/master/docs/README.md

Macsen’s research publications

Macsen: A Voice Assistant for Speakers of a Lesser Resourced Language, Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020), pages 194-201 Language Resources and Evaluation Conference (LREC 2020), Marseille, France Paper

Building Intelligent Assistants for Speakers of a Lesser-Resourced Language,CCURL 2016 2nd Workshop on Collaboration and Computing for Under-Resourced Languages ‘Towards an Alliance for Digital Language Diversity’ (LREC 2016), Portoroz, Slovenia. Paper

Tuag at Gynorthwyydd Personol Deallus Cymraeg, Astudiaeth Fer o APIs ar gyfer Gorchmynion Llafar, Systemau Cwestiwn ac Ateb a Thestun a Lleferydd ar gyfer Llywodraeth Cymru. Report

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.