Microsoft on May 20, 2026, released RAMPART and Clarity—two open-source tools designed to integrate AI safety testing into continuous integration pipelines for agentic AI systems. The tools tackle a growing challenge in software development: ensuring autonomous AI agents behave safely and as intended before they reach production. With agentic AI—systems that can independently plan, execute multi-step tasks, and interact with tools—becoming mainstream, traditional testing methods fall short. These new tools aim to bridge that gap.
Agentic AI introduces unpredictable behavior. Unlike deterministic code, agents may hallucinate, take harmful actions, or violate constraints in subtle ways. Catching such issues late in the development cycle can cause reputational damage, security breaches, or worse. By shifting safety checks left into the CI pipeline, RAMPART and Clarity help developers identify and mitigate risks early, similar to how static analysis and unit tests prevent bugs in traditional software.
What RAMPART and Clarity Are
RAMPART (Risk-Aware Multi-Perspective Agentic Red-Teaming) automates adversarial testing of AI agents. It runs suites of behavioral tests that probe an agent’s decision-making under various scenarios, checking for safety, alignment, and compliance with defined policies. Clarity, on the other hand, is a specification and validation framework. It lets teams formally describe their product assumptions and safety requirements before writing agent code, then continuously verifies that the implemented agent adheres to those specifications throughout development.
Both tools are open source under the MIT license and are available on GitHub. Microsoft positions them as part of a broader push toward "safe by design" AI, where safety isn’t an afterthought but a first-class concern in the developer workflow. By releasing them openly, the company hopes to accelerate adoption of robust safety practices across the industry and foster community contributions.
Why CI Safety for Agentic AI Matters Now
Continuous integration has been a cornerstone of modern software engineering for over a decade. Developers commit code, automated builds run, tests execute, and feedback loops keep quality high. However, AI agents—especially those based on large language models—challenge traditional CI assumptions. An agent that works correctly today might produce unexpected output tomorrow due to model nondeterminism, prompt drift, or changes in the underlying API.
Agentic AI also amplifies risk by granting autonomy. An agent might be instructed to "book a flight," but without proper guardrails, it could overspend, expose personal data, or misinterpret urgency. Safety incidents in production, such as Bing Chat’s early unhinged conversations or autonomous trading bot misfires, highlight the need for rigorous pre-deployment testing. RAMPART and Clarity give teams a systematic way to bake safety into every commit.
Microsoft’s announcement reflects a shift in the software supply chain. Just as DevSecOps integrated security scanning into CI, "DevSafeAI" must weave AI safety checks into the existing pipeline. RAMPART plugs into standard CI tools like GitHub Actions and Azure Pipelines, running headless agent tests on each pull request. Clarity integrates with project planning phases, allowing safety specs to be version-controlled alongside code.
RAMPART: Automating Agent Behavior Tests
RAMPART is designed to perform multi-dimensional red-teaming on agentic systems. It generates adversarial inputs—prompts, tool calls, environment changes—and evaluates the agent’s responses against a configurable set of safety criteria. The framework supports defining custom safety rules, such as “the agent must not disclose PII,” “must refuse illegal requests,” or “must stay within a budget.”
Teams can write test scenarios in YAML or Python, describing initial agent state, the adversarial input, and expected behavior. RAMPART then executes these as part of the CI run. A failed test halts the pipeline, preventing unsafe code from merging. The tool comes with a growing library of default red-team scenarios drawn from Microsoft’s own experience with Copilot and other agentic products. Scenarios include prompt injection attacks, goal hijacking, and tool misuse.
A key differentiator is RAMPART’s risk-aware approach. Not all failures are equal; the tool allows prioritization based on severity. High-risk failures block merges, while lower-risk findings generate warnings for review. This prevents alert fatigue and keeps velocity high. The open-source nature means organizations can tailor the risk taxonomy to their own domain, be it healthcare, finance, or entertainment.
Clarity: Examining Assumptions Before Implementation
Clarity addresses a more conceptual phase of safety. Often, unsafe agent behavior stems from flawed assumptions: developers assume a certain API will always return valid data, that an LLM will never output a malicious payload, or that users will follow expected interaction patterns. Clarity lets teams codify these assumptions as formal specifications and then automatically validate them against the agent’s code and runtime behavior.
Specifications are written in a human-readable declarative language, similar to policy-as-code tools like Open Policy Agent. For example, a team building a financial assistant might specify: “The agent shall never initiate a transfer exceeding $500 without multi-factor confirmation.” Clarity can then inject probes during CI to verify the rule holds across many simulated conversations. It also includes static analysis components that inspect prompt flows and tool definitions for logical gaps.
By bringing assumption-checking into the CI loop, Clarity complements RAMPART. Where RAMPART dynamically hammers the agent with attacks, Clarity statically ensures the design itself is sound. Together they provide a comprehensive safety net. Microsoft suggests that teams can start with Clarity during project inception, using it to create a “safety contract” that evolves with the codebase.
Open Source and Community-Driven Evolution
Releasing RAMPART and Clarity as open source is a strategic move. Microsoft has increasingly embraced open source for AI tools (think DeepSpeed, ONNX Runtime, and Semantic Kernel). By making these safety tools freely available and modifiable, the company hopes to build a community around AI safety engineering—a discipline still in its infancy.
The repositories include detailed documentation, example workflows, and integration guides for popular CI platforms. The project is under the Microsoft Open Source Code of Conduct, and external contributions are welcomed. Early collaborators from academia and industry are already extending the red-teaming scenarios and specification libraries. Microsoft plans to offer managed versions of both tools as part of its Azure AI platform, but the core will remain open.
This model mirrors the success of tools like Terraform or Kubernetes, where open-source foundations spur broad adoption and commercial offerings provide enterprise support. For organizations hesitant to run agentic AI without guardrails, RAMPART and Clarity lower the barrier to safe experimentation.
Real-World Use and Early Feedback
While the tools are new, Microsoft has been dogfooding them internally for months. Teams working on Copilot for Microsoft 365 and Azure AI agents integrated RAMPART into their CI pipelines. According to a blog post by the company’s AI safety group, this caught hundreds of high-severity safety issues before they reached staging. False positive rates remain a challenge, but iterative tuning of the rulesets has improved precision.
Clarity is reportedly being piloted by several Microsoft partners building autonomous agents for supply chain management and customer service. These pilots revealed that explicit assumption documents often uncover misunderstandings between product managers and engineers. By forcing teams to write down assumptions in a machine-checkable format, Clarity reduces the gap between intent and implementation.
Community reactions on Hacker News and Twitter have been largely positive, though some developers question whether AI safety testing can ever be fully automated. Critics argue that red-teaming is an inherently creative task that resists scripted approaches. RAMPART’s team acknowledges this and sees the tool as augmenting, not replacing, human oversight. The open-source model allows the community to push the tool’s capabilities further.
Integrating with Existing DevOps Practices
Adding new tools to a CI pipeline can be disruptive. Microsoft designed RAMPART and Clarity to fit into existing workflows. Both provide native plugins for GitHub Actions and Azure DevOps, with community adapters for Jenkins and GitLab expected soon. Configuration is minimal: a few lines of YAML to define the test set and specification files. Results appear directly in the pull request view, alongside unit test results.
For teams already using LLM gateways or observability platforms, RAMPART’s output can be forwarded to monitoring systems for trend analysis. Clarity’s assumption specs can be exported to business intelligence tools, giving non-technical stakeholders visibility into safety guarantees. This aligns with the broader “shift left” philosophy—safety becomes everyone’s responsibility, not just QA or risk teams.
The Road Ahead
Microsoft has outlined a rough roadmap for both tools over the remainder of 2026. Upcoming features include support for multi-agent scenarios, where RAMPART will test interactions between collaborating agents, and Clarity will check cross-agent contracts. A graphical editor for Clarity specifications is also in the works to make the tool accessible to product managers. Additionally, integration with Azure AI Content Safety and other policy engines will enable real-time enforcement of the specifications in production.
The launch of RAMPART and Clarity signals a maturing AI toolchain. As agentic AI becomes the backbone of automation everywhere, the industry can’t afford to skimp on safety. Open-source, CI-native tools like these could become as essential as linters or security scanners. Developers who ignore them may find that their agents, however brilliant, become liabilities. Microsoft is betting that safe agents win—and giving the community the tools to build them.