Report: Microsoft Tests DeepSeek V4 for Copilot Cowork, Aiming to Cut Inference Costs

Microsoft is internally evaluating a fine-tuned, Azure-hosted version of DeepSeek V4 as a lower-cost alternative backend for its upcoming Copilot Cowork collaborative AI service, according to sources familiar with the project. The move, still in exploratory stages, reflects a strategic pivot to rein in runaway inference costs as the company prepares to embed generative AI deeper into Microsoft 365 enterprise workflows.

Copilot Cowork, expected to debut later this year as a shared, persistent AI workspace within Microsoft Teams and Loop, is designed to break free from per-user chat limitations. Instead of isolated Q&A sessions, Cowork lets entire teams collaborate with a single AI instance that remembers context, monitors documents, and proactively surfaces insights — a fundamentally different architecture that demands far more compute resources than today’s one-off prompts.

That architectural shift has forced Microsoft’s AI platform team to confront an uncomfortable truth: the economics of large language model inference, particularly with OpenAI’s industry-leading models, become untenable at enterprise scale. Current Copilot for Microsoft 365 plans provide 300 monthly AI credits per user, but heavy Cowork usage could burn through those credits in days, not weeks. To avoid alienating customers with surcharges or throttled performance, Redmond is quietly probing whether open-source models fine-tuned on Azure can deliver adequate quality at a fraction of the price point.

The Inference Economics That Drove Microsoft to Open Models

The core of the problem is token cost. GPT-4 class models, even through volume pricing on Azure OpenAI Service, routinely cost tens of dollars per million output tokens when considering extended context windows and advanced reasoning. Multiply that across thousands of users in a single Cowork instance that might ingest entire SharePoint libraries and maintain conversation history spanning months, and annual inference bills can easily eclipse the per-seat licensing revenue itself.

Enter DeepSeek, the Hangzhou-based AI lab that shocked the industry in late 2024 by releasing DeepSeek-V3, an open-weight model that matched or exceeded several GPT-4 benchmarks while costing 90% less to train and serve. While V3 is real, Microsoft’s exploration now centers on DeepSeek V4, a yet-to-be-announced successor that insiders describe as targeting frontier reasoning performance with a modular architecture optimized for enterprise deployment. By hosting a fine-tuned fork entirely on Azure, Microsoft retains full control over data processing, encryption, and regional residency requirements — the same guarantees enterprises demand from OpenAI models today.

“This is not about ditching OpenAI,” a person briefed on the evaluations told WindowsNews. “It’s about giving enterprise customers a choice. Some workloads — like summarizing already-known content or generating routine meeting notes — don’t need state-of-the-art reasoning. A leaner, faster model can handle those tasks at 20% of the cost, and that savings can be passed on as more flexible credit pricing.”

How Credit Pricing Works — and Why It’s Central to the Pivot

Microsoft’s AI credit model, introduced alongside Microsoft 365 Copilot’s commercial launch, allocates a pool of monthly credits per user to fuel generative AI operations. Advanced queries, real-time data pulls, and large context windows consume multiple credits per request. The system was designed primarily for single-user chats, not for persistent, multi-user environments like Cowork, where credit consumption multiplies with every additional participant and integration.

Internal modeling shared under NDA with several Fortune 500 early-adopters of Cowork showed that a team of 50 heavy users could exhaust their collective monthly credits within 72 hours if the system relied exclusively on GPT-4. By contrast, a hypothetical DeepSeek V4 pipeline — with quantization, prompt caching, and speculative decoding optimized on Azure’s Maia accelerators — extended that runway to 28 days while maintaining acceptable accuracy on non-critical tasks.

This math has spurred a dual-track credit overhaul. The first track preserves the existing Copilot credit scheme for classic chat, powered by OpenAI. The second, codenamed “Flex Consumption,” would allow enterprises to route specific Cowork requests to a cheaper model pool, drawing from a separate, significantly lower-cost credit bank. Sources say Microsoft is also evaluating whether to offer a standalone Cowork SKU entirely powered by open models, priced at a 30–40% discount over the OpenAI-backed tier.

Azure Control: Why Self-Hosting DeepSeek Matters

Hosting DeepSeek V4 on Azure is non-negotiable for Microsoft’s compliance architecture. Enterprise customers, particularly in Europe and government sectors, demand ironclad data sovereignty guarantees. Running inference on Azure ensures that prompts, responses, and fine-tuning artifacts never leave the tenant’s chosen region and are protected by the same encryption and access controls as Exchange Online or SharePoint.

Moreover, Azure AI Studio’s model catalog already features several open-source LLMs, including Meta’s Llama family and Mistral’s open models. Adding a DeepSeek offering — even one kept internal to Microsoft 365 — aligns with CEO Satya Nadella’s philosophy of providing a “model garden” where customers can pick the right engine for the right job. The first public sign of this philosophy came earlier this year, when Microsoft added Phi-3.5 as an alternative backend for Copilot Studio custom agents.

Taking control of inference infrastructure also gives Microsoft leverage in its complex partnership with OpenAI. By having a credible alternative ready, Microsoft can negotiate better infrastructure pricing and potentially decouple from premium-priced frontier models when market conditions demand it — all without disrupting the end-user experience.

Quality and Safety: The Fine-Tuning Hurdle

Critics will point out that open models, even fine-tuned, rarely match the polished reasoning and safety guardrails of GPT-4. Microsoft’s Responsible AI team is deeply involved in the Cowork evaluations, subjecting DeepSeek V4 to adversarial testing, hallucination benchmarks, and enterprise document comprehension tasks. Early results are mixed: for routine summarization and drafting, V4 with lightweight fine-tuning reached 92% of GPT-4’s quality score at less than one-tenth the inference cost. But for complex multi-step reasoning involving financial data or legal contracts, gaps widened significantly.

To mitigate this, Microsoft is building a novel orchestration layer that can dynamically classify incoming prompts by complexity and route high-stakes queries back to OpenAI while relegating low-stakes ones to the cheap model. Dubbed “Cost-Aware Router,” this system could be the linchpin that makes Cowork’s economics work without sacrificing trust.

Security researchers, however, have raised red flags about incorporating Chinese-developed models into a critical enterprise pipeline, even if fine-tuned and hosted entirely on Azure. DeepSeek’s training data provenance and potential alignment biases are under scrutiny by Microsoft’s compliance unit. One solution under discussion: starting the DeepSeek-powered Cowork deployment in non-regulated APAC markets to gather real-world telemetry before pursuing FedRAMP or EU Data Boundary certification.

The Community’s Pulse

In developer communities and AI infrastructure forums, the idea of Microsoft adopting DeepSeek has been met with a mix of excitement and cynicism. Enthusiasts point to DeepSeek’s publicly documented training efficiency — the lab achieved breakthroughs in mixture-of-experts routing and FP8 training that drastically reduced hardware requirements. Others note that Microsoft’s exploration is less about technical curiosity and more a direct response to Google’s aggressive pricing of Gemini models within Workspace and Notion’s transparency around AI costs.

“Microsoft is feeling the heat,” wrote one r/MachineLearning commenter. “When Google charges $10 per user per month for Gemini Advanced and throws in unlimited documents, Microsoft knows they can’t keep charging $30 plus credits forever. Open models are their escape hatch.”

Enterprise IT managers, meanwhile, are cautiously optimistic. In a LinkedIn thread discussing the rumors, several CIOs expressed relief that Microsoft is finally addressing the “credit cliff” inherent in persistent AI workspaces. “If I can get 80% of the usefulness for 20% of the cost, I’ll take that deal every time,” one Fortune 500 infrastructure lead commented.

What This Means for the Future of Copilot

The DeepSeek exploration signals a broader maturation of Microsoft’s AI strategy. No longer content to ride exclusively on OpenAI’s coattails, the company is building a genuine multi-model ecosystem that can adapt to wildly different cost, latency, and capability requirements. Copilot Cowork may be the first product to showcase this philosophy, but it won’t be the last.

Already, Microsoft’s Bing team is reportedly testing open models for cheaper summarization snippets, and the Azure ML team is developing tooling to help enterprises swap models in and out of RAG pipelines without code changes. If Cowork succeeds with a dual-model approach, expect the pattern to propagate across the entire Microsoft 365 suite — from Outlook’s email classification to Designer’s image generation.

The biggest unresolved question is when, and whether, Microsoft will ever publicly acknowledge a list of officially supported “Copilot engines” that includes names like DeepSeek, Llama, or Phi. Doing so would be a seismic shift, potentially angering OpenAI while giving enterprise customers unprecedented transparency. For now, all experimentation is strictly internal, and no timeline for Cowork’s general availability — let alone its final model architecture — has been set.

Microsoft declined to comment on the record. DeepSeek did not respond to requests for comment. OpenAI referred us to statements emphasizing the depth of its partnership with Microsoft but did not address the specific reports.

The Bottom Line

Enterprise AI costs are no longer a footnote; they’re the headline. Microsoft’s reported interest in DeepSeek V4 for Copilot Cowork reveals a company that has done the math and concluded that the only sustainable path to AI ubiquity is through a ruthlessly efficient inference stack — one that supplements premium models with cheaper, fine-tuned alternatives. Whether regulated enterprises will accept a China-born model on Azure is an open question, but the economic imperative is undeniable. For now, the industry watches and waits to see if Cowork becomes the first major Microsoft product to prove that open-source AI has a place in the corporate mainstream.