AI in India Is Expanding Its Lexicon

by Nour El Souki - December 05, 2023
Reading time: 2 min

Post Views: 81

AI in India is leaping to embrace multilingual AI tech, enhancing accessibility and transforming digital communication for millions.

In Karnataka, a state in southwest India, people spent a few weeks this year helping to create the country’s first AI-based chatbot for tuberculosis.

They did this by reading sentences in Kannada, their native language, into an app.

Kannada is a significant language in India, with over 40 million speakers. It’s one of the 22 official languages and is spoken by heaps of people. But here’s the thing, India is known for its diversity, with over 121 languages spoken by at least 10,000 people. So, you can consider India a language goldmine.

But here’s the problem.

Most of these languages aren’t covered by natural language processing (NLP) – the AI tech that helps machines understand what we’re saying or writing. This is a significant miss when think of the millions of Indians left out, leaving them from getting valuable information and economic opportunities.

Kalika Bali, a principal researcher at Microsoft Research India, highlighted that, “for AI tools to really work for everyone, they need to include languages beyond the usual English, French, or Spanish.” But collecting as much data in Indian languages as what went into something like GPT (Generative Pre-trained) would take ages, like 10 years! So, the workaround is to build layers on top of AI models like ChatGPT or Llama.

Now, get this: those folks in Karnataka are part of a larger group. Thousands of Indians are sharing their speech data with Karya, a tech company. Karya then creates datasets that big names like Microsoft and Google use for AI in healthcare, education, and more.

The Indian government is also on board with Bhashini, their AI-driven language translation system. It’s all about open-source datasets in regional languages to develop AI tools. They’re getting people involved through crowdsourcing – validating audio, translating texts, labeling images, you name it.

Bhashini’s got tens of thousands of Indian contributors. Pushpak Bhattacharyya, who heads the Computation for Indian Language Technology Lab in Mumbai, says the government’s really pushing to create datasets for large language models in Indian languages. These models are already helping with translations in education, tourism, and even in courts.

English is the most advanced language in natural language processors. ChatGPT, which caused the loudest noise in generative AI, mainly trains in English. Amazon’s Alexa speaks nine languages, but only three are non-European. Google’s Bard? English only.

But there’s a global effort to close this language gap. In the UAE, there’s Jais for Arabic generative AI applications, and in Africa, Masakhane is advancing NLP research in African languages.

Time magazine’s influencer in AI, Kalika Bali, says crowdsourcing is super useful in a country like India. It captures all the linguistic and cultural nuances.

Here’s something to ponder: out of 1.4 billion people in India, less than 11% speak English. That’s why AI in India focuses a lot on speech and speech recognition, especially since many people struggle with reading and writing.

Google’s Project Vaani is collecting speech data from about a million Indians. This data will be available for speech-to-speech and automatic speech recognition systems. Cool, right?

Even the Supreme Courts of Bangladesh and India are using AI-based translation tools. And there’s Jugalbandi, an AI chatbot by AI4Bharat and Microsoft, helping with queries about welfare programs in multiple Indian languages. You can even access it on WhatsApp, which is huge in India.

Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.

Google Pay Goes Live in Lebanon, While Apple Pay Leaves iPhone Users Disappointed

Monty Mobile Wins GCCM 2025 Recognition Award for Innovative eSIM Solutions at the CC-Global Awards in Berlin

From Lost Revenue to Lasting Value: How MNOs Can Reclaim Control of A2P SMS

Switzerland’s rise in the global space economy

Google Pay Launches in Lebanon, Why Is Apple Pay Falling Behind?

Wi-Fi 8 Taking Connectivity to New Levels Starting 2028

Meta’s Under Sea Internet Cables Will Keep Us Connected

Is Ericsson’s 5G Uplink Speed Worth the Cybersecurity Risk?

Starlink’s Direct-to-Cell Service Goes Beyond Consumer Use

China Telecom Industry Open to Foreign Investors

US Military Employs Data Poisoning as New AI Weapon

Can Cats Understand Human Language? Baidu’s AI Can Help

The AI Layoff Plan CEOs Don’t Want to Talk About

AI’s Questioning Pollution Research, Deciding Who Gets to Regulate It

Chinese Mosquito Drones for Stealth Military Operations

MyMonty: The New Era of Banking

Entering the Monty Multiverse at Seamless 2023

Seamless Dubai 2023 - From Concept to Reality: Shaffra Technologies Opens Doors to Metaverse Mastery

Take A Look in the Mirror. The Greatest Technology of All Will Stare Back at You

Monty Mobile Enters Multibillion-Dollar MNO Equipment Industry

Are We Addicted to Social Media? IG, TikTok Trigger Physical and Emotional Withdrawal

Meta's AI on Instagram, Facebook Helps Save Lives

US DoT’s New Safety Plan Introduces Car Communication

Little Girl Receives First Prosthetic Eye from MRI, CT Scans

DeepL’s AI Translation Software to Get Traditional Chinese

AI in India Is Expanding Its Lexicon

But here’s the problem.