Google, the United States-based technology company, and a group of African research and community organisations have launched a new project aimed at helping more Africans benefit from voice-driven technology in their own languages.
The project, announced in Accra, Ghana’s capital, on Monday, 2 February 2026, is called WAXAL. It is described as a large, openly accessible speech dataset designed to support research and help developers build more inclusive Artificial intelligence (AI) tools.
Okay News reports that the dataset is meant to reduce a major gap in speech data for African languages, a shortage that has made it difficult to build voice assistants, speech-to-text tools, and other voice-enabled services for many communities across the continent.
The organisers said WAXAL is designed to support more than 100 million speakers by providing foundational speech data for 21 languages spoken across Sub-Saharan Africa, a region of Africa located south of the Sahara Desert. They explained that while voice technologies have become common in many parts of the world, the limited availability of high-quality speech recordings and transcripts has slowed similar progress for most of Africa’s more than 2,000 languages.
According to the announcement, the dataset was developed over three years with funding from Google. It includes about 1,250 hours of transcribed natural speech, which is important for training and testing speech recognition systems. It also contains more than 20 hours of high-quality studio recordings intended to support the creation of clearer, more natural synthetic voices.
Aisha Walcott-Bryant, the Head of Google Research Africa, said the long-term goal is to help Africans build technology that works for their communities and economies.
“The ultimate impact of WAXAL is the empowerment of people in Africa,” she said. She added that the dataset is meant to provide a foundation for students, researchers, and entrepreneurs to develop tools in their own languages, and that Google expects innovators to use it for education-focused products and voice-enabled services that can create economic opportunities.
The organisations behind the project also said the work was designed to be built by and for African communities. They explained that African academic and community partners led the data collection process, with technical guidance from Google specialists, and that the partner institutions will retain full ownership of the data as part of an approach meant to support more equitable partnerships in AI development.
Among the partners named were Makerere University, one of Uganda’s leading public universities based in Kampala, Uganda’s capital; the University of Ghana, a major public university in Accra; and Digital Umuganda, a Rwanda-based community organisation involved in technology and civic initiatives.
The dataset covers the following languages: Acholi, Akan, Dagaare, Dagbani, Dholuo, Ewe, Fante, Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Kikuyu, Lingala, Luganda, Malagasy, Masaaba, Nyankole, Rukiga, Shona, Soga (Lusoga), Swahili, and Yoruba.