Microsoft’s next big thing for the cloud: an agent that keeps things cool when everything else falls apart

Microsoft promises relief to developers who are woken up at 3 am due to inactivity and other cloud problems: an agent informed by its years of experience using Azure, designed to diagnose anything that goes wrong and recommend possible fixes.
One big advantage for people: the agent can work without stress, fatigue, or the tunnel vision that often prevents people from doing it when they are half asleep.
“Agents are a little more emotionally attached,” said Brendan Burns, Microsoft’s technical partner and business vice president who was one of Kubernetes’ creators. He pointed out that the agents do not feel pressure when the manager asks for a quick investigation of the cause.
The Azure Copilot Observability Agent, in preview since late last year, was made available on Tuesday. It investigates events by connecting logs, metrics, traces and other signals spread across company systems, then points engineers to a possible cause.
At this point, the agent does not fix the problems itself. Microsoft is also introducing what it calls autonomous operations, in preview, allowing an agent to check and investigate alerts without a human prompting them. But it still stops acting. It won’t restart a utility or change a configuration, for example, instead leaving it up to humans to decide and do it.
Microsoft joins a crowded field. Datadog made its Bits AI SRE agent generally available in December, and Amazon’s AWS followed with a similar DevOps Agent this spring. Microsoft said the agent is priced based on usage rather than a flat seat license, which is the same model AWS uses for DevOps Agent.
Well-known existing players including Dynatrace, Splunk, New Relic and Grafana are quickly moving in the same direction, alongside the first wave focused on AI.
In an interview with GeekWire this week, Burns said he believes Microsoft’s breadth is one of its advantages, seeing customer software over competitors, from GitHub to Azure to deployments to signal generation. Knowing how those connect, he said, helps the agent trace the problem back to the line of code behind it.
More than a decade ago, Burns and his then-Google colleagues Joe Beda and Craig McLuckie created Kubernetes, an open source software that allows companies to run applications on large, ever-changing infrastructure. It has been the foundation of cloud computing, and has added to the complexity teams now have to manage.
Kubernetes brought a kind of self-healing to that world: when something breaks, it automatically works to restore the system to a healthy state. But there are fixed rules, Burns said. “Too decisive” – ”can’t form ideas, can’t investigate solutions.”
AI tools like Azure’s cognitive agent are designed to add that missing layer: building a theory about what went wrong, testing it against data, and continuing to work toward a solution.
Complete autonomy – allowing the agent to act, not just investigate – is still coming down. In a blog post on Tuesday, Burns framed the launch as part of a broader shift toward “agent actions,” which are the underlying cause of all signals and will one day be able to implement them.
Meanwhile, the agent can do more digging, even if the person is still on the phone.
Burns, who recalled once pulling a 36-hour shift, said he can think of “a lot of nights that would have been more fun if I had this 10 years ago.”



