MIT and Microsoft Reveal Murakkab: A Plain-Language System That Writes Its Own Cost-Efficient AI Agents

Researchers from MIT and Microsoft Azure have pulled back the curtain on Murakkab, a new system that lets developers describe agentic AI workflows in plain language while the platform automatically generates implementations optimized for cost, energy, and compute. The work is slated for a formal debut at the prestigious USENIX Symposium on Operating Systems Design and Implementation (OSDI) in 2026, but early details point to a significant leap in how AI agents are built and deployed on cloud infrastructure.

Murakkab enters a landscape where agentic AI—software that can plan, use tools, and iterate autonomously—has exploded in popularity, but remains notoriously expensive and complex to run at scale. Frameworks like Microsoft’s AutoGen, LangChain, and Semantic Kernel have lowered the barrier to building agents, yet developers still grapple with prompt chaining, token management, and cloud resource provisioning. Murakkab aims to abstract that entire stack behind a natural language interface.

From Description to Deployment in Plain Language

The core innovation is a compiler-like pipeline that accepts high-level, human-readable specifications of a multi-step agent task—for example, “Whenever a customer submits a support ticket, classify its urgency, fetch related documentation, draft a reply, and schedule a follow-up if no response within 24 hours”—and transforms it into an optimized execution graph. Under the hood, Murakkab reasons about the intent, breaks the workflow into discrete steps, selects the most cost-effective large language model (LLM) for each step, decides between API calls and local execution, and even determines the optimal degree of parallelism.

This isn’t just a prompt engineering trick. The system builds a formal representation of the workflow and then applies combinatorial optimization techniques to minimize a multi-objective cost function that factors in dollar expense, energy consumption (watts), and compute seconds. A paper preview shared with collaborators notes that Murakkab can reduce cloud costs by up to 40% compared to hand-tuned agent pipelines, while cutting end-to-end latency by an average of 25%.

How Murakkab Optimizes Behind the Scenes

At its heart, Murakkab treats an agentic workflow as a directed acyclic graph (DAG) of operations—LLM inferences, tool calls, data lookups, conditional branches—and annotates each node with resource requirements and latency profiles drawn from historical Azure telemetry. When a developer writes a plain-language goal, a large language model (ironically, likely a fine-tuned version of GPT-4 or Phi) translates it into a candidate DAG, which then feeds into an optimizer.

The optimizer leverages mixed-integer linear programming to choose which model family (e.g., GPT-4o mini vs. GPT-4o) handles each node, whether to batch requests, cache results, or replace an LLM call with a deterministic function. It also factors in the carbon intensity of available Azure regions in real time, automatically shifting workloads to data centers powered by renewable energy when the SLA allows. The result is an execution plan that is not only cheaper but also demonstrably greener.

Early benchmarks across three common enterprise scenarios—IT helpdesk automation, contract summarization with compliance checking, and real-time document Q&A—showed that Murakkab’s generated plans consistently outperformed both naive implementations and those hand-optimized by experienced Azure engineers. The system’s secret sauce is its ability to consider trade-offs that a human would find tedious, such as whether the 3% accuracy gain from a larger model is worth the 8x cost increase for a particular subtask.

Why This Matters for Windows and Azure Developers

For the millions of developers building on the Microsoft stack, Murakkab promises to slash the operational complexity of deploying AI agents. Today, integrating a Copilot-like assistant into a line-of-business app often means wrangling Azure AI Services, orchestrating multiple models, and monitoring spending through Cost Management dashboards. Murakkab could collapse that into a single natural language statement written in a tool like Visual Studio Code, with the resulting agent deployed as a containerized microservice on Azure Kubernetes Service or Azure Container Apps.

Microsoft has not yet confirmed productization timelines, but the involvement of Azure researchers alongside MIT’s Distributed Systems Group suggests Murakkab will first appear as an experimental Azure service or an open-source project integrated with the existing AI toolchain. One plausible path is a new “Agent Optimizer” blade in the Azure AI Studio portal, where users toggle between “quick build” (manual) and “optimized build” (Murakkab) modes.

Windows developers specifically could benefit from desktop-local offloading. Murakkab’s energy-aware optimizer could decide to run smaller models directly on a user’s PC via Windows Copilot Runtime or the on-device Phi-Silica model in Windows 11 24H2, reserving cloud calls only for the most demanding reasoning steps. This hybrid execution would keep latency low and data privacy high while still tapping into cloud-scale intelligence when necessary.

Community Reaction and Open Questions

Although the formal presentation is still months away, reaction in research circles has been cautiously optimistic. On platforms like X and Hacker News, developers have praised the focus on cost and energy, calling it “the missing piece” that could push agentic AI from prototype to production. One recurring question is flexibility: can the system handle novel tools or domain-specific APIs without extensive retraining? Early documentation suggests Murakkab uses a plugin architecture that reads OpenAPI specs and function signatures directly, much like the way AutoGen and Semantic Kernel operate today, so integration should be straightforward.

Another concern is transparency. Because Murakkab generates the final workflow autonomously, debugging failures could be harder than in a manually coded pipeline. The team is reportedly building a “white-box mode” that produces a human-readable explanation of why each node was chosen, along with sensitivity analyses showing how changes to the plain-language prompt affect cost and performance.

There’s also the question of vendor lock-in. While the prototype targets Azure, the underlying optimization framework is described as cloud-agnostic in the research paper. Nothing prevents an enterprising team from porting it to AWS or GCP, though the tight integration with Azure’s carbon-aware APIs may give Microsoft a head start.

The Bigger Picture: Toward Sustainable AI

Murakkab arrives at a moment when the compute appetite of large AI models is under intense scrutiny. Training and inference for massive models consume electricity on par with small towns, and enterprises are increasingly pressured to report carbon footprints. By baking energy efficiency directly into the agent design loop, Microsoft and MIT are not just saving dollars—they are aligning with broader corporate sustainability pledges and regulatory trends like the EU’s Energy Efficiency Directive.

If successful, Murakkab could set a new baseline for how agentic AI is engineered: no longer a raw frontier of handcrafted prompts and oversize models, but a disciplined practice where plain-language intent yields an optimized, auditable, and green execution plan. That’s a vision likely to resonate far beyond the Windows and Azure ecosystem.

What to Expect at OSDI 2026

Attendees of OSDI 2026 in Vancouver will get a full technical deep dive, including live demos of Murakkab reducing a complex multi-agent research assistant from a $0.82 per-query cost to $0.19 without loss in answer quality. The paper will also introduce a new metric, the “Green Agent Score,” that combines accuracy, latency, dollar cost, and carbon grams per task into a single number—a potential industry benchmark.

Until then, the Windows and Azure communities can only speculate, but the message is clear: the next generation of AI agents will be described, not coded, and they will be smarter about the resources they consume.