Microsoft announced on July 1, 2026, that Azure Chaos Studio Workspaces is entering public preview, bringing scenario-driven resilience testing to Azure workloads. The new capability allows teams to simulate real-world disruptions—like an entire Azure availability zone going offline—with just a few clicks, making it easier to validate that applications can gracefully handle such outages.
Azure Chaos Studio, generally available since 2023, has provided a platform for running chaos experiments on Azure resources. But with Workspaces, Microsoft is shifting from individual fault injections to higher-level, out-of-the-box scenarios that reflect common failure patterns. This change aligns with the industry’s growing emphasis on proactive resilience testing, especially as organizations migrate mission-critical systems to the cloud.
The public preview introduces a curated set of scenarios, starting with zone-down and region-down simulations. These are built on top of the existing chaos experiment engine, but they abstract away much of the manual configuration. Instead of specifying individual VM shutdowns or network latency injections, users simply select a scenario, target a resource group, and define parameters like the scope of impact. The platform then orchestrates the necessary faults in a controlled manner, complete with rollback protections and monitoring hooks.
What’s New in Workspaces
Azure Chaos Studio Workspaces serves as a centralized hub for managing and executing these scenarios. Key features include:
- Scenario catalog: Predefined templates for zone failure, region failure, and gradually expanding combinatorial scenarios. Microsoft says additional scenarios—such as throttling database connections or degrading Cosmos DB performance—are on the roadmap.
- Managed execution: The service automatically sequences faults, validates prerequisites, and ensures a safe blast radius. If an experiment begins to cause unexpected impact, the system can be stopped manually or automatically based on defined health checks.
- Integration with Azure Monitor and Application Insights: Workspaces can query these services to verify that the application remains responsive during a simulated outage. Results appear in a single dashboard, showing whether the workload met its resiliency targets.
- Microsoft Entra ID governance: Access to workspaces and experiments is controlled via role-based access control (RBAC) and Entra ID, allowing teams to delegate testing duties while maintaining security.
- Cost efficiency: Because the service handles much of the orchestration, teams spend less time building and maintaining custom chaos scripts.
During the preview, Workspaces is available in all Azure public regions, though some scenarios may have regional limitations. Microsoft encourages users to provide feedback through the Azure portal’s feedback tool.
How Scenario-Driven Testing Works
The traditional chaos engineering workflow demands that engineers identify failure modes, craft hypotheses, and manually design experiments. Workspaces automates the design step. For a zone-down scenario, the platform:
- Identifies all resources within the selected scope that are zonal—virtual machines, managed disks, load balancers, etc.
- Determines a fault injection strategy that simulates a zone failure without actually breaking anything. Often, this means applying network security group rules to block traffic to and from resources in that zone, effectively quarantining them.
- Runs the experiment for a configurable duration (default 10 minutes).
- Monitors the health of dependent services—like Azure Front Door failover or SQL Database replicas—to confirm that the redundancy mechanisms kicked in.
- Restores normal connectivity automatically after the experiment ends.
Users can customize the scenario by adding their own validation steps, such as pinging a specific endpoint or checking that a queue drain completes. The workspace dashboard logs every step, making it easy to share results with stakeholders or include them in compliance reports.
Why This Matters for Azure Customers
Cloud outages are inevitable. In the past year alone, several major cloud providers experienced zone-level disruptions that took down customer workloads for hours. Most well-architected Azure applications are designed to withstand such events, but without testing, assumptions remain unverified. Azure Chaos Studio Workspaces lowers the barrier to entry for chaos engineering—teams no longer need deep expertise in infrastructure or failure modes to start testing.
For enterprises running applications that span multiple regions, the region-down scenario is even more powerful. It simulates a complete regional outage, forcing applications to fail over to a secondary region. This helps validate that DNS configurations, data replication, and networking are set up correctly. Many Azure customers build disaster recovery plans but rarely test them; Workspaces provides a safe, repeatable way to do so.
Pricing and Availability
Microsoft has not yet released pricing details for Workspaces. During the public preview, usage is free, but customers will incur costs for the underlying resources involved in experiments—for example, VMs that are stopped and started or network throughput. General availability pricing will likely follow the existing Chaos Studio model, which charges per experiment run minute.
To get started, navigate to the Azure Chaos Studio blade in the Azure portal and look for the “Workspaces” tab. The service requires no additional registration for preview, but users should review the preview terms and limitations. Microsoft’s documentation includes step-by-step tutorials for setting up your first zone-down experiment.
Looking Ahead
The public preview of Workspaces signals Microsoft’s commitment to making resilience testing a standard part of the Azure development lifecycle. As the service evolves, expect deeper integration with Azure Policy, GitHub Actions, and Azure DevOps pipelines, enabling teams to run chaos experiments as part of their CI/CD workflows. Microsoft also hinted at AI-powered insights that could analyze experiment results and suggest architectural improvements.
For now, the preview offers a pragmatic on-ramp for teams that know they should be doing chaos engineering but haven’t had the time to build it themselves. It’s a welcome addition to Azure’s reliability toolkit, and it will be interesting to see how the community adopts and extends these scenarios.