Home

Taught Alexa to speak हिंदी like a local, and now 18% more Indian homes chat with her 🗣️

Alexa, kya bolti tu? - Enhancing Alexa's Hindi Core Language Model

During my short time (had to leave because of U.S. Higher Education) as an NLP Engineer at Amazon's Alexa (NLU) India team at Bengaluru, India, I had the opportunity to work on significantly improving the core Hindi language model powering Alexa's voice interactions for the Indian market.

As part of the NLP modeling team, I was responsible for researching linguistic nuances of major Hindi dialects, curating diverse training data reflecting dialects and facilitating user testing and feedback collection.

Problem

While Alexa supported Hindi from 2020, the initial language model faced challenges understanding the diversity of Hindi dialects and accents & Hinglish across different regions of India. This resulted in suboptimal performance, especially for users speaking in their local Hindi variations.

The primary objective was to enhance Alexa's Hindi core model to better handle language variations including Hinglish, leading to increased usability and adoption among Indian users.

What's in the name, they say. But in Alexa’s case, it’s everything! And so, customers ask Alexa, "Tumhara naam kya hai?" at least once a minute.

Approach

Linguistic Research & Data Collection

I began by conducting extensive research into the evolution, variations and nuances of Hindi across different regions like Uttar Pradesh, Bihar, Delhi, Rajasthan and more. This linguistic analysis helped identify key areas to focus model improvements.

Hindi changes every 100 kilometers or so!

Alexa in Hindi
Alexa in Hindi

A/B Testing of updated core model

Rigorous A/B testing was conducted to compare the performance of the enhanced Hindi model against the existing version. Key metrics tracked included:

A/B Testing of Alexa Bilingual Intent Test in Console for select users
A/B Testing of Alexa Bilingual Intent Test in Console for select users

User Feedback and Dogfooding

A critical aspect of the project was gathering comprehensive feedback from real users. We facilitated:

Challenges

“दस सेकंड आगे जाओ."

“Alexa, kitne aadmi the?”

"Bangalore का मौसम दिखाओ"

Alexa was trained to differentiate between the oft-used Hindi word “achcha” or “okay,” which can sound close to its wake word. In many households, a single conversation can have Hindi and English words liberally interspersed.

As said, Hindi changes every 100 kilometers or so and understanding Hindi and Hinglish is critical, but problem is that there’s no way to know in advance which interactions with Alexa will require follow-up questions and responses. Making that fully aware of what comes next in Hindi was key challenge for us.

Improvements & Impacts

Much better response to requests such as “Alexa, dhai baje ka alarm set karo” or “Alexa, tham jaa”.

After rigorous testing cycles, the improved Hindi language model was rolled out across India in early 2023.

Key results in the first 3 months:

A/B Testing of Alexa Bilingual Intent Test in Console for select users
Early beta stage users' feedbacks



Takeaways

It’s natural to wait for others. But it’s not okay to just…wait and do nothing. You can always take other initiatives such as getting started on another project, following up with the stakeholder or document your designs for future use.


Just say “Alexa, Hindi mei baat karo” to get started ✨



Want a deeper dive?
Get in touch to schedule a presentation.