When comparing English ChatGPT and Arabic ChatGPT, it is important to consider the complexities of the Arabic language and the challenges it presents for AI.
- ChatGPT may not be suitable for certain texts such as legal documents, medical reports, scientific studies, and literary works.
- Arabic poses specific challenges for AI translation, including the difficulty of tokenization due to diacritics and inflections.
I’m not going to sit here and pretend like Arabic, specifically Classical Arabic, isn’t complicated. It is beautiful, especially the poetry, but difficult to learn and navigate. I mean, despite it being my mother tongue, I struggle with it, A LOT. But theoretically, an AI should not have such struggles. It can recommend you a thorough investment plan but isn’t adequately trained in the fifth most-spoken language globally behind Mandarin, Spanish, English, and Hindi? Arabic ChatGPT is not on the same level as the English one.
Lost in Translation?
Translation apps out there are not the greatest, we can agree on that much. They tend to translate word for word rather than meaning. ChatGPT is an elite AI tool, so, you would expect it to have prowess in that area. In a recent article, ChatGPT for Arabic-English Translation: Evaluating, the author pointed out that the bot lacked training in its understanding of domain-specific terminology and cultural context. As a result, they compared its outputs to professional translations of various text genres. they acknowledge that the OpenAI’s ChatGPT has merit as a translator largely due to its proficiency in managing complex and uncommon language combinations, performing simultaneous translation for time-critical tasks, and its capacity to learn from user feedback and enhance translation quality. They, however, came to the conclusion that despite ChatGPT generally providing accurate translations, its limitations make it unsuitable for some texts:
- Legal documents
- Medical reports
- Scientific studies
- Literary works
Arabic and Its Challenges
When I asked Arabic ChatGPT “Who is Crowned Prince Mohammad bin Salman?” in Arabic, it took 1 minute and 41 seconds to generate a 63-word paragraph (15 of which were spent “thinking”). But when I asked that same question in English, it took less than 10 seconds to get a 76-word response. Looks like I’m not the only one that struggles with the language. The paper found that the AI struggled on several fronts.
AI relies on something called Tokenization to break down a string of text or speech into identifiable units. Think of your child dividing words into syllables to learn how to read them. Same concept, different “species.”
Turns out the little, small marks above or below Arabic letters (e.g., fat-ḥah, dammah, and kasrah) are called diacritics. And they make tokenization of the written text more difficult. You might think that they are insignificant, but their presence signifies vowel sounds. They make a world’s difference. It’s the difference between Adam having written (كَتَبَ /kataba/) and Adam having been written (كُتِبَ /kutiba/).
The Arabic language is highly inflected. An inflection in language is a modification in the form of the word expressing a grammatical function or attribute such as tense, mood, etc… Think of how the plural of “chicken” in English is “chickens.” In Arabic, however, it gets complicated, very complicated. And it, again, affects the tokenization. A simple example of this is saying that you bought 2 chickens. But in Arabic, “2 chickens” are a single word: the base word for “chicken” and the suffix for “2” and that suffix changes depending on where the word falls grammatically.
I get it. I do. The language is difficult. But is that reason enough to take about 313 million Arabic speakers out of the discourse? Leave them behind? Arabic ChatGPT needs to be on par with the English one.
Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.