RL Environments Ambition the Rise of Agentic AI Companies in the Race to Smarter Agent 

Reinforcement learning environments are the focus of AI research labs and a new generation of agentic AI companies.

Reinforcement learning (RL) environments have become the new battleground for AI research labs and venture capital firm, with startups like Mechanize and established agentic AI companies competing to supply complex virtual training grounds for multi-step AI tasks. 

San Francisco to Silicon Valley, the competition is reshaping the future of AI training

Just as labeled datasets set the last wave of machine learning, RL environments today are seen as the next phase of advancement. They are providing virtual working space in which AI agents can rehearse multi-step activities, ranging from online shopping to enterprise workflows.  

The advancement is transforming the AI agent landscape, creating new opportunities for startups and danger to investors. 

Why RL Environments Matter 

Initially, these areas serve as practice grounds.  

“All the big AI labs are building RL environments in-house,” said Jennifer Li, a general partner at Andreessen Horowitz. “But as you can imagine, creating these datasets is very complex, so AI labs are also looking at third party vendors that can create high quality environments and evaluations. Everyone is looking at this space.” 

That demand has fueled agentic AI startups, such as Mechanize and Prime Intellect, are alongside established firms like Mercor and Surge. According to The Information, Anthropic has even discussed spending more than $1 billion on RL environments over the next year.  

The excitement also comes with challenges, including practices for governing agentic AI systems, as researchers seek to ensure these simulations remain safe and reliable. 

Competition and Doubts 

Competition is fierce. Surge CEO Edwin Chen has experienced “substantial growth” in demand, and Mercor CEO Brendan Foody stated, “few realize how huge the opportunity for RL environments really is.” 

Even Scale AI, the former labeling leader, is making the move to stay in front of the trend. 
Scaling the technology, however, won’t be easy. 

“I think people are underestimating how difficult it is to scale environments,” said Taylor, former Meta AI lead. 

Such issues are on the brink of agentic AI issue resolution, whereby at times AI agents find workarounds, also known as “reward hacking,” which are manipulative and don’t solve the actual task. Researchers are also revisiting earlier approaches, such as AI rational agent models, to find novel equilibrium. 

However, optimism never falters in spite of skepticism.  

Meanwhile, increasingly AI agent development services and AI agent construction platforms are cropping up, all competing for their share of the AI agents market map.  

For the moment, at least, the promise of RL environments continues to fuel innovation, and put agentic AI companies in the spotlight. 


Inside Telecom provides you with an extensive list of content covering all aspects of the tech industry. Keep an eye on our Intelligent Tech sections to stay informed and up-to-date with our daily articles.