Anthropic's Claude 3.5 Models Now Generally Available on Azure Foundry, Powered by Nvidia Blackwell Ultra GPUs

Anthropic’s Claude models became generally available on Microsoft Azure AI Foundry on June 29, 2026, marking the first time the AI lab’s frontier models run end-to-end on Microsoft’s own accelerated infrastructure. The deployment uses Nvidia’s GB300 Blackwell Ultra systems, bringing a new tier of inference performance to Azure customers and signaling a deepening alliance between Microsoft and one of the world’s leading AI research companies.

The launch delivers the full Claude model family—Claude 3.5 Sonnet, Claude 3.5 Haiku, and Claude 3.5 Opus—directly through Azure AI Foundry’s model catalog. Developers and enterprises can now tap into Claude’s advanced reasoning, coding, and agentic capabilities without leaving Azure’s compliance boundary, using a consumption-based billing model built around Compute Credit Units (CCUs). The move places Microsoft’s cloud squarely in competition with Amazon Bedrock and Google Cloud’s Vertex AI, both of which have long offered Claude as a managed service, but with a key differentiator: Microsoft’s exclusive use of the latest Blackwell Ultra silicon for Claude inference.

Under the Hood: Nvidia GB300 Blackwell Ultra Acceleration

The technical backbone of this offering is Nvidia’s GB300 Blackwell Ultra GPU, the successor to the H100 and a leap forward in large-model inference. The Blackwell architecture introduces a second-generation Transformer Engine, FP4 tensor cores, and a massive 192 GB of HBM3e memory per GPU—double what the H100 offered. For Claude, a model family known for its long-context windows and multi-step reasoning, the extra memory bandwidth and faster low-precision math cut latency by up to 30% compared to previous-generation accelerators, according to early benchmark data shared by Microsoft.

End-to-end Azure hosting means the entire inference pipeline—from the frontend API gateways down to the GPU clusters—runs on Microsoft’s network, with data residing in the customer’s chosen Azure region. This addresses a major concern for regulated industries: data sovereignty. Financial services firms, healthcare providers, and government agencies can now use Claude while keeping prompts and completions within their existing Azure Virtual Network boundaries, subject to the same encryption and security policies they already apply to other workloads.

Microsoft engineers worked closely with Nvidia to optimize model serving on the GB300. They leveraged the GPU’s new NVLink Switch and 800 Gb/s Quantum-3 InfiniBand networking to build large-scale, low-latency inference clusters. For customers, that translates to higher throughput at lower cost per token, especially for traffic patterns that spike unpredictably—a common scenario in agentic workflows where multiple chain-of-thought calls happen in sequence.

AWS Out, Azure In: The Strategic Shift

Until this launch, Amazon Web Services had been the primary hyperscale partner for Anthropic, hosting Claude since its early days. Google also invested in Anthropic and offers Claude on Vertex AI. But Microsoft, which has invested billions in OpenAI and built a deep partnership around the GPT family, is now publicly hedging its bet by welcoming a frontier rival into its ecosystem. Industry analysts see it as a pragmatic move: giving Azure customers choice, preventing lock-in to any single model provider, and capturing revenue from the growing share of enterprises that prefer Claude for tasks like legal analysis, advanced coding, and complex multi-step agent orchestration.

By running Claude entirely on its own infrastructure, Microsoft retains full control over the security and operations stack. Customers also get a unified billing experience: the new CCU system lets them pay for all AI model usage across Azure AI Foundry with a single meter, simplifying budgeting and chargebacks. Microsoft has committed to offering the same SLA for Claude as for its own Azure OpenAI Service, including 99.9% availability for production workloads.

Inside Azure AI Foundry: Access, Tools, and Agent Framework

Azure AI Foundry, the rebranded evolution of Azure Machine Learning and Azure AI Studio, serves as the unified platform for building generative AI applications. With Claude’s GA, the model now appears in the Foundry model catalog alongside GPT-4o, Llama, Mistral, and other popular models. Developers can start using Claude immediately through a serverless API endpoint, avoiding the need to provision dedicated GPU capacity.

The onboarding experience is streamlined:
- Navigate to the Azure AI Foundry portal, select "Model catalog," and filter by provider “Anthropic.”
- Choose a Claude model and deploy it with a single click, selecting the Azure region and the desired capacity (throughput in tokens per minute).
- Use the built-in playground to test prompts, adjust system messages, and compare models side-by-side.
- Integrate the endpoint via SDK (Python, JavaScript, C#) or REST API, with Azure Active Directory authentication.

Foundry’s prompt flow and evaluation tools work natively with Claude. Developers can build complex pipelines that chain Claude calls with other Azure services—Azure Cognitive Search for retrieval-augmented generation, Azure Functions for tool calling, and Azure AI Agent Service for multi-agent orchestration. This is where the “AI agents” tag in the announcement becomes tangible: Claude’s prowess at using tools, maintaining long context, and generating structured JSON makes it a natural fit for autonomous agent scenarios, and Foundry provides the scaffolding to deploy such agents at scale, with built-in tracing, monitoring, and content safety filters.

CCU Billing: A Credit-Based Model for Predictable Costs

One of the headline features of this launch is the introduction of Compute Credit Unit (CCU) billing for third-party models on Azure. Instead of paying per token directly, customers purchase or commit to a certain number of CCUs, and each model consumes CCUs at a different rate depending on its size and complexity. For Claude 3.5 Sonnet, for example, 1,000 input tokens might consume 0.003 CCUs, while 1,000 output tokens consume 0.015 CCUs. The exact rates are published in Azure’s pricing calculator, and a cost-estimation widget is available during deployment.

CCUs bring three advantages:
1. Unified spend management: A single prepaid commitment or monthly draw can cover usage across multiple models, simplifying procurement.
2. Granular control: IT admins can set per-project or per-team CCU budgets, preventing surprise bills.
3. Transparent comparison: Because all models use the same currency, it’s easier to compare the cost-effectiveness of Claude versus other models for a given task.

Microsoft has also introduced a free tier granting the first 100,000 CCUs per month at no charge for evaluation, a move aimed at encouraging experimentation. The company claims this level of billing simplicity and cross-model flexibility is unmatched by competing platforms, though AWS and Google offer similar token-based pricing with reserved discounts.

Performance and Enterprise Readiness

Early enterprise beta testers reported that Claude 3.5 Sonnet on Azure delivers response times comparable to the Anthropic direct API, with p50 latency under 1.2 seconds for prompts up to 4,000 tokens. The GB300’s FP4 inference, combined with Azure’s optimized serving stack, keeps time-to-first-token low even during peak loads. Microsoft has deployed the service in 12 Azure regions initially, including East US, West Europe, and Southeast Asia, with plans for more regions by year-end.

Security and compliance certifications are already in place. Claude on Azure inherits the broader Azure compliance posture, including SOC 2 Type II, HIPAA, GDPR, and ISO 27001. Data is encrypted in transit and at rest, and the platform offers optional features like customer-managed keys and private endpoint support via Azure Private Link. For organizations that require content filtering, Azure AI Content Safety can be layered on top of Claude’s output with tunable severity thresholds, though that does introduce a small latency overhead.

The Agentic AI Push: Claude Meets Azure’s Agent Service

Perhaps the most forward-looking aspect of this release is how Claude integrates with Azure AI Agent Service, a framework for building autonomous software agents that plan, reason, and execute tasks with minimal human intervention. Azure’s agent framework provides a managed runtime where multiple specialized agents (one for research, one for code execution, one for data retrieval) can collaborate using a shared memory store and a scheduler.

Claude’s extended 200K context window and its ability to follow complex system prompts make it a strong candidate for the orchestration agent role. In a typical setup, a Claude agent might delegate a subtask to a smaller, cheaper model like Phi-4, invoke a Python code interpreter, or call an external API via a plugin. Foundry’s tracing tool captures the entire call tree, making debugging and auditing straightforward.

Azure customers in the legal and consulting sectors have already begun building proof-of-concept systems where Claude agents review thousands of documents, summarize key clauses, and draft reports—all within the Azure ecosystem. The combination of Claude’s reasoning and Azure’s governance layer is likely to accelerate adoption in highly regulated verticals.

Pricing Snapshot and Competitive Context

While final per-token costs vary by region and commitment level, indicative pricing places Claude 3.5 Sonnet at roughly $7.50 per million input tokens and $30.00 per million output tokens when using the CCU model with pay-as-you-go rates, before any enterprise discounts. That is on par with Anthropic’s direct API pricing and competitive with OpenAI’s GPT-4o rates on Azure OpenAI Service. Reserved capacity and Provisioned Throughput Units (PTUs) are available for customers needing predictable performance, with discounts of up to 40% for one-year commitments.

Compared to Amazon Bedrock, Azure’s differentiator remains the infrastructure stack: the Blackwell Ultra GPUs are not available on any other competing cloud for Claude inference. AWS currently serves Claude on its Trainium2 and Inferentia3 chips (for smaller footprints) and on H100-based instances for larger workloads. Google leverages its TPU v5p. Microsoft’s early access to Blackwell Ultra gives it a theoretical performance per watt advantage that could translate into better pricing in the future, though for now costs are aligned with the market.

Developer Community Reaction

On the Windows forums and social media, the response has been largely positive. Developers who have grown frustrated with managing multi-cloud deployments welcomed the ability to consolidate their model portfolio on Azure. “Finally I can use GPT for rapid prototyping and Claude for the heavy lifting, all in the same VNet,” one user wrote. Another praised the CCU model for removing the “token math” headache of cross-model cost comparisons.

Some skepticism remains, however. A vocal minority questioned whether Microsoft would eventually deprioritize Claude in favor of its own Copilot-branded models once the partnership’s novelty fades. Others noted that the availability of Claude on the same platform as the Azure OpenAI Service could lead to internal competition, possibly confusing enterprises about which model to standardize on. For now, Microsoft’s message is clear: it plans to be the cloud with the broadest AI model selection, and Claude’s GA proves it.

What’s Next: Fine-tuning, RAG, and Beyond

Microsoft and Anthropic have hinted at upcoming capabilities that go beyond simple inference. Fine-tuning for Claude 3.5 Sonnet is slated for private preview in Q4 2026, allowing enterprises to customize the model on proprietary data while keeping weights within Azure’s security boundary. Retrieval-augmented generation using Azure AI Search is already supported, and the integration will deepen with native vector indexing and hybrid search optimizations specifically tuned for Claude’s embedding style.

Longer-term, both companies are exploring how Claude’s upcoming memory and personalization features could be exposed through Foundry’s identity model, so that enterprise employees get a consistent, context-aware assistant across all Azure-connected applications—Teams, Office, and custom line-of-business apps. This vision of a “personal AI agent” powered by Claude, but deployed and governed centrally by the organization, aligns with Microsoft’s broader Copilot strategy while giving customers a choice of the underlying brain.

Conclusion

The general availability of Anthropic’s Claude models on Azure AI Foundry is more than a feature launch; it’s a strategic inflection point for AI cloud services. By pairing one of the most capable model families with the first public deployment of Nvidia’s GB300 Blackwell Ultra GPUs, Microsoft positions Azure as the premium platform for inference-intensive agentic workloads. The CCU billing model eliminates friction for enterprises managing multiple AI services, and the tight integration with Azure’s security and governance stack makes Claude a viable option for regulated workloads that were previously Azure-only for compliance reasons.

For Windows enthusiasts and IT pros watching the AI landscape, the message is clear: the battle for AI infrastructure is no longer about offering a single flagship model; it’s about providing the best platform to run many models, and Azure just threw down a formidable gauntlet.