The New Gemini with a Multi-Splash of Creativity

Multimodal Interaction The New Gemini with Multimodal Interaction

Gemini, a large language model (LLM) developed by Google AI, is enhancing its multimodal interaction capabilities. It can comprehend and process not only text but also audio, images, and other formats. This enables richer interactions compared to text-only language models.

What Does It Do?

Gemini, being multimodal, paves the way for various applications, producing creative text formats based on images or music. It can answer complex questions requiring an understanding of visual or audio information, offering more nuanced search results that combine text, images, and videos. Plans are underway to integrate the services into applications like Gmail, Maps, and YouTube, suggesting broader applicability across various cases.

Gemini can tailor personalized user experiences by making its content user-based, meaning it recommends and shows results depending on individuals’ preferences and context. This enhances productivity, as Gemini has the potential to transform users’ work, catering to personalized learning experiences by breaking down language barriers.

On top of that, Gemini possesses a creative and artistic side, where it can generate creative text formats, including poems, codes, and scripts.

Google’s Claiming

Google claims that Gemini is surpassing human experts, not only in voice commands but also in the Massive Multitask Language Understanding (MMLU), demonstrating superior problem-solving abilities and reasoning capacities.

Gemini is still under development, and that’s for a reason; it’s facing several challenges, such as data privacy regulations and misinformation. These risks highlight the issues of transparency, ethics, and the principles of AI. Questions arise about its capabilities, such as whether it can truly surpass human creativity. The complexity of human brains and their functionality require much more than multimodal interaction, encompassing aspects like gestures, drawings, or other forms of interaction.

Although it’s in the testing phase, Gemini has big promises for deeply engaging in a multitude of industries, such as healthcare, finance, and manufacturing, which can shape the users’ interactions with technology.

Will Gemini break through with its new multimodal interaction and win over its competitors? Awaiting Google’s claiming on this one.

There are clear ambitions for worldwide reach, yet initial availability will be restricted. It is introduced in the English language and can only be accessed in the U.S., Korea, and Japan. Remarkably, it is unavailable in Europe, the UK, and Latin America.


Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.