Did DeepSeek Use Google’s Gemini to Deep-Train Itself?

New DeepSeek model released with R1 updated math and coding capabilities, even though the company has yet to release training data sources.

On May 28, a new DeepSeek model was released with an updated R1 model with excelled math and coding capabilities, even though the company has yet to release its training data sources, but some researchers speculate most of the data may have come from Google’s Gemini AI models, according to Reuters.

In December, developers noticed its V3 model kept marking itself as ChatGPT, OpenAI’s AI chatbot, which indicated that it might have been trained on the chat logs from GPT.

As per the latest incident, Sam Paech, Melbourne-based developer who created “emotional intelligence” evaluations for AI, wrote a post on X, his evidence on how China’s DeepSeek AI model, is in fact trained on information from Gemini.

Paech stated that the DeepSeek model is called R1-0528, which if he wanted to carry a DeepSeek Gemini, resembles Google’s Gemini 2.5 Pro.

Earlier this year, OpenAI told the Financial Times it had found evidence suggesting the new DeepSeek model used extraction as a method for training AI models by extracting data from larger, more advanced ones.

According to Bloomberg, Microsoft, a key OpenAI partner and investor, detected large-scale data exfiltration from OpenAI developer accounts in late 2024, accounts that OpenAI suspects are linked to DeepSeek.

How does DeepSeek’s efficiency compare to other AI models?

AI experts aren’t eliminating the option that DeepSeek trained itself on data from Google’s Gemini due to the DeepSeek AI features and capabilitiesthat it developed.One of the AI experts Nathan Lambert, a researcher at the nonprofit AI research institute AI2 experts, supported the idea that DeepSeek models comparison trains itself from Gemini.

In April, OpenAI introduced an ID verification procedure for institutions seeking access to particular advanced models. Institutions are required to present a government issued identification from one of the nations supported by the OpenAI’s API, apart from China.

Simultaneously, Google recently started “summarizing” the data traces generated by models on its AI Studio developer platform, a change that makes it more difficult to train competitive models from Gemini data. Back in May, Anthropic also indicated that it will summarize traces from its own models, in response to the need to protect its “competitive advantages.”

China’s Originality at Being Unoriginal

The DeepSeek AI model architecture has been accused multiple times of training on other bots; the first allegations began in December when DeepSeek’s V3 identified itself as ChatGPT.

Chinese are renowned for taking something and making it their own, often building upon existing technologies to make it uniquely their own. They have done this in every industry, from telecom to AI particularly with the DeepSeek AI comparison to GPT, where Chinese entities have rapidly evolved by taking and improving advancements from global leaders.


Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.