Technology & AI

Patronus AI gets $50M to build ‘digital worlds’ for stress-testing AI agents

AI agents are becoming more sophisticated. They range from answering questions to automating complex multi-step tasks.

But before these agents can be trusted to book trips or perform financial analysis on behalf of users, model providers and the infrastructure for implementing such agents want to ensure that they operate reliably across a range of conditions.

AI labs often use benchmarks to demonstrate the power of their models, but high scores, even for marketing agents, do not prove that AI can perform a variety of complex, real-world tasks correctly.

Patronus AI, a startup founded in 2023 by former Meta AI researchers Anand Kannappan and Rebecca Qian, helps modelers and companies fine-tune models to do just that by creating digital simulation environments in which to test the performance of agents.

The San Francisco-based startup must be solving an important problem. Almost every frontier AI lab and many startups are now customers, according to Glenn Solomon, managing director at Notable Capital, who describes the need for the company’s simulation facilities as nearly insatiable.

Patronus’ revenue grew 15-fold last year, fueling strong investor interest. On Thursday, the company announced a $50 million Series B round led by Greenfield Partners, with participation from Notable Capital, Lightspeed, Datadog, and Samsung. This round brings the company’s funding to $70 million.

Patronus uses what it calls “digital world models” to build websites and internal systems. In these environments, agents are stress tested after training using reinforcement learning, which repeatedly rewards successful task completion and punishes errors.

AI labs see great value in these digital simulations because they give agents the opportunity to try out different, sometimes unexpected, scenarios. The company compares its approach to how Waymo trained autonomous vehicles by first building an artificial world to test cars against unusual hazards, such as bad weather or a child running after a soccer ball.

The difference with AI agents is that they tend to take shortcuts, which means they fail to complete the task correctly. “Patronus is very good at spotting hacks and making sure they respond to models,” Solomon said.

Patronus currently offers its own digital simulation world for software engineering and finance, but this is just the beginning, according to Kannappan.

“Today we are focusing more on verifiable problems, so problems that you can quickly check and verify, but there are many areas that are not verified or are very difficult to verify,” he said.

Just because these procedures are proven doesn’t mean they’re easy. “We want to be able to create an environment where you can use an agent who can work for 10 hours or 10 days or 10 weeks,” Kannappan said.

As for competitors, Patronus believes that it primarily competes with internal teams of AI labs that have already been built to test agent behavior. While human data companies like Mercor and Surge help modelers with reinforcement learning, Patronus works differently by testing how agents behave without human involvement.

If you shop through links in our articles, we may earn a small commission. This does not affect our editorial independence.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button