An artificial intelligence system has outperformed doctors in diagnosing complex health conditions, achieving over 80 per cent accuracy compared to 20 per cent for clinicians working alone.
The system uses a so-called “diagnostic orchestrator” – a bespoke agent-like tool designed to replicate how a panel of expert physicians might collaborate to solve difficult cases.
In tests using specially selected case studies, the AI “solved” more than eight out of 10 cases. Practising doctors, working without access to colleagues, textbooks or digital tools, managed two out of 10.
Developed by Microsoft’s AI unit, led by British tech pioneer Mustafa Suleyman, the system is described by the company as a step towards “medical superintelligence”.
The company stated: “Scaling this level of reasoning – and beyond – has the potential to reshape healthcare.
“AI could empower patients to self-manage routine aspects of care and equip clinicians with advanced decision support for complex cases.”
The technology is designed to function like a real-world clinician, working through each case step by step – for example, asking questions or ordering tests such as blood work or imaging – before reaching a diagnosis.
A patient with cough and fever symptoms, for instance, might need blood tests and a chest X-ray before pneumonia can be confirmed.
Microsoft used over 300 complex case studies from the New England Journal of Medicine, which were converted into interactive challenges to evaluate the AI system.
The testing used existing AI models developed by OpenAI, Meta, Anthropic, Elon Musk’s Grok and Google’s Gemini.
The diagnostic orchestrator acts as an intermediary, working alongside the AI models to decide which tests to request and which diagnoses to consider.
Microsoft says this gives the system a “breadth and depth of expertise” that spans multiple medical disciplines and goes beyond individual physicians.
When paired with OpenAI’s advanced o3 model, the system achieved a success rate of over 80 per cent in solving the case studies.
Microsoft also says the approach is more cost-efficient than human doctors, as it tends to order tests more efficiently.
However, Microsoft acknowledges the technology is not yet ready for clinical use. Further testing is needed to evaluate its performance on more common symptoms before any real-world deployment.
The research focused on “diagnostically complex and intellectually demanding” cases, limiting its immediate application to everyday medical practice.