Microsoft Moves Copilot Cowork to Usage-Based Billing, Eyes Self-Hosted DeepSeek V4 for Enhanced Enterprise Security

Microsoft has officially launched Copilot Cowork for general enterprise availability, shifting the collaborative AI tool to a metered billing model while simultaneously confirming it is evaluating a Microsoft-hosted version of DeepSeek V4—or another open-weight model—to strengthen data privacy and security for business users. Effective June 16, 2026, organizations can now deploy the agentic AI assistant across their workflows, paying only for the compute resources consumed rather than a fixed per-user subscription. This dual announcement signals a strategic pivot in how Microsoft delivers and secures AI capabilities for Windows-centric enterprises.

The move to usage-based pricing ends months of speculation about how Microsoft would monetize Copilot Cowork, a tool designed to automate cross-application tasks by orchestrating multiple AI agents under a single command. Rather than a flat monthly fee per seat, customers will be billed through Azure Metering based on metrics such as token consumption, API calls, or agent execution time. Microsoft has not disclosed exact unit costs, but early adopters in the private preview, which began in Q1 2026, reported flexible scaling that aligned costs with actual usage—a welcome change for firms with sporadic AI needs.

This pricing model mirrors Azure’s broader shift toward consumption-based services, but it also introduces complexity in budget forecasting. To address this, the Copilot Cowork admin console includes real-time cost tracking, configurable spending caps, and alerts that integrate with existing Azure Cost Management tools. IT admins can set per-department or per-project limits, ensuring that agent-driven processes do not unexpectedly inflate monthly bills. Microsoft’s own documentation suggests that heavy users—such as legal teams drafting documents with Cowork—may see expenses comparable to the previous $30 per-user monthly rate, while lighter, occasional usage could drop costs by 40% or more.

The more provocative piece of the announcement, however, is Microsoft’s confirmation that it is exploring a Microsoft-hosted DeepSeek V4 deployment. DeepSeek V4, a next-generation open-weight model rumored to rival GPT-4o in reasoning and multilingual tasks, would be integrated as a routing option within Copilot Cowork. Currently, Cowork relies on a mix of models, including GPT-4o and specialized small language models (SLMs) for latency-sensitive operations. By adding a self-hosted open model, Microsoft aims to give enterprises a path to keep sensitive data entirely within their Azure tenant, avoiding any external API calls that could raise compliance flags.

This initiative comes as regulatory pressure mounts across the EU and financial services sectors, where firms demand strict data residency and model transparency. A Microsoft-hosted DeepSeek V4 would run in a customer’s own Azure subscription, with inference endpoints isolated behind a virtual private network. Unlike the standard Copilot, which processes prompts in Microsoft’s shared infrastructure, Cowork’s agent orchestration can route tasks to the local model when instructed by data governance policies. Microsoft has not committed to a launch date for this feature, but engineering leads hinted at a public preview by Q4 2026 during the June 16 briefing.

Security architects have long voiced concerns over blind data transmission to cloud AI models. With Copilot Cowork’s agentic nature—where AI agents autonomously access emails, documents, and line-of-business apps—the risk surface expands dramatically. Microsoft’s solution is twofold: first, refine existing Purview data loss prevention policies to monitor agent actions in real time; second, offer a bring-your-own-model (BYOM) framework where organizations can plug in their own fine-tuned or open-weight models. DeepSeek V4, with its reported 128K context window and competitive coding benchmarks, is being positioned as a “first-party BYOM option” that Microsoft will manage, update, and secure on behalf of customers.

The intersection of usage billing and model routing creates a unique economic lever. When Copilot Cowork routes a query to a locally hosted DeepSeek V4 instance, the per-token cost could be significantly lower than calls to GPT-4o, because the customer is essentially paying only for the underlying GPU compute. Microsoft is expected to charge a modest management fee for hosting and maintaining the model, but early estimates suggest a 50-70% reduction in inference costs for high-volume workloads. Conversely, external model calls will carry a premium to offset licensing and API orchestration overhead. Cowork’s routing engine, code-named “Mercury,” will automatically select the most cost-effective model for each task based on complexity, latency requirements, and governance labels.

That governance layer is critical. During the enterprise launch, Microsoft demonstrated how a financial services firm could configure Cowork so that any task involving personally identifiable information (PII) is routed exclusively to the in-tenant DeepSeek V4 endpoint, while less sensitive administrative work defaults to the cheaper, faster GPT-4o mini. Auditors can review a tamper-proof log of every routing decision, and conditional access policies can even block Cowork from running if the hosted model endpoint is unreachable—guaranteeing that no data leaves the premises.

For Windows enterprise administrators, these capabilities arrive via the latest Microsoft 365 admin center update (version 2.1.4.6), which surfaced alongside the Windows 11 24H2 “Enterprise AI” cumulative update KB5038762. The update adds new Group Policy objects for controlling Copilot Cowork model routing, enabling IT to enforce tenant-wide model selections or delegate them to compliance teams. A new PowerShell module, CoworkPolicy, also allows scripted enforcement of billing thresholds and model restrictions, addressing common automation needs in large-scale deployments.

Given the news cycle around DeepSeek, questions about model provenance and security are inevitable. DeepSeek V4 is developed by a Chinese AI lab, and while the model weights are expected to be openly released under a permissive license, the potential for embedded biases or supply-chain vulnerabilities cannot be ignored. Microsoft claims it will audit the model’s training data and dependencies before offering it as a managed service. Vic Patel, Microsoft’s CISO for AI Platforms, stated in a technical white paper accompanying the launch, “We will treat any hosted open-weight model with the same rigors we apply to our own models: red-teaming, responsible AI screening, and ongoing monitoring for drift or malicious fine-tuning.”

Still, some security researchers are skeptical. The AI Incident Database already contains multiple cases of open models delivering unexpected outputs when used in agentic loops. A self-hosted model, while insulating data, could become a new attack vector if the model file is poisoned or if the container runtime is compromised. Microsoft’s mitigation includes running DeepSeek V4 inside a confidential computing virtual machine on Azure, leveraging AMD SEV-SNP or Intel TDX to encrypt model weights in memory. This ensures that even Microsoft operators cannot inspect the running inference without explicit customer authorization.

Adoption of Copilot Cowork has been robust in the weeks since the general availability date, with over 4,500 enterprise tenants activating the service in the first 48 hours, according to Microsoft’s telemetry. Early feedback highlights mixed reactions. A senior IT manager at a Fortune 500 retailer told us that the metered billing eliminated the “all-you-can-eat anxiety” of per-user licenses, but admitted that cost optimization will require continuous tuning. “We’re still figuring out how to balance performance and spend—Cowork sometimes picks the expensive model for a simple email summary, and we need to train users to set the sensitivity labels correctly.”

Microsoft has published a best-practices guide that recommends starting with broad model routing rules and then tightening them based on audit data. The guide suggests phasing in locally hosted DeepSeek V4 only after a 30-day evaluation period using synthetic data, to establish baseline performance and security posture. For organizations hesitant to adopt an open-weight model from a foreign company, Microsoft confirmed it is also evaluating other alternatives, including a potential partnership with Mistral AI or the next iteration of Meta’s Llama, both of which could be offered under a similar managed-hosting model.

The competitive landscape is heating up. Google’s Duet AI for Workspace shifted to usage-based pricing in late 2025, and Amazon’s Q Business now offers agentic task routing with custom model hosting. Microsoft’s differentiator is deep integration with the Microsoft Graph, enabling Cowork agents to pull context from emails, calendar, Teams chats, and SharePoint repositories out of the box. The addition of a self-hosted open model erodes one of the last remaining objections from privacy-focused sectors: that using Copilot meant indirect exposure to external model providers. By combining BYOM with consumption billing, Microsoft is effectively redefining the enterprise AI stack as a composable service rather than a monolithic product.

For Windows-centric businesses, the implications stretch beyond Copilot Cowork itself. The same Mercury routing engine and hosting framework will eventually underpin other Copilot experiences, including the Windows Copilot sidebar and the Microsoft 365 Copilot Chat. A future update, tentatively scheduled for the Windows 11 24H2 Moment 2 release in September 2026, will allow individual users to designate personal models for on-device tasks, such as summarizing private documents offline using a local SLM—all managed through the same Azure metering and governance plane.

As enterprises begin to onboard, Microsoft has rolled out a series of learning paths on Microsoft Learn and a dedicated FastTrack program for Copilot Cowork. The company is also ramping up its partner ecosystem, with early solutions from ISVs like ServiceNow and SAP that build Cowork agents for specific business processes. The message is clear: agentic AI is no longer a pilot project; it is a platform with a pricing model that reflects real-world usage patterns and a security model that acknowledges geopolitical data realities.

Ultimately, the June 16 launch represents a maturation of Microsoft’s AI strategy—one that acknowledges that one size does not fit all. By decoupling the assistant layer from a single proprietary model and letting the market choose its preferred intelligence engine, Microsoft is positioning Copilot as the universal interface for work, with Windows and Azure as the trust fabric. The success of this bet hinges on execution: can Microsoft seamlessly host and secure third-party models at scale without degrading the user experience? Early indications suggest that the technical groundwork is solid, but enterprise feedback over the next two quarters will reveal whether the economics and the politics align.

For IT decision-makers, the immediate task is to model projected costs under the new billing scheme and define a governance policy for AI model selection. Those already using Azure OpenAI Service will find the transition familiar, but the introduction of hosted open models adds a new dimension of due diligence. With Copilot Cowork now generally available and DeepSeek V4 on the horizon, the enterprise AI landscape is entering a phase where control and compliance finally catch up with capability.