Cybercriminals have launched a sustained campaign targeting exposed enterprise AI backends, hijacking unsecured Ollama and LiteLLM endpoints to run autonomous penetration-testing agents and offensive tooling. Between March and May 2026, threat actors actively scanned the internet for open inference servers and API gateways, leveraging the stolen compute power and AI capabilities to streamline their attacks.

The operation marks a significant escalation in malicious use of AI infrastructure, turning organizations' own machine learning tools against them. Security researchers observed that attackers were not just stealing data or cryptojacking; they were using the compromised endpoints to host and execute fully autonomous attack agents—software that can probe networks, identify vulnerabilities, and even exploit them without human intervention.

What Are Ollama and LiteLLM?

Ollama is an open-source platform designed to simplify the deployment of large language models (LLMs) on local machines. It allows developers to run models like Llama, Mistral, and Phi on their own hardware, providing a REST API for inference. By default, Ollama binds to localhost (127.0.0.1) to prevent unauthorized remote access. However, many users expose the API by binding it to 0.0.0.0 or omitting proper firewall rules, making it accessible over the internet.

LiteLLM, on the other hand, is an open-source proxy that aggregates multiple LLM APIs into a single, OpenAI-compatible interface. It acts as a central gateway, managing authentication, load balancing, and cost tracking across services like Azure OpenAI, Anthropic, and Hugging Face. LiteLLM is often deployed in enterprise environments to unify access to various models. Like Ollama, if misconfigured, its API can be left exposed without authentication, granting anyone the ability to invoke powerful language models.

Both tools have seen rapid adoption among developers and enterprises integrating AI into their workflows. But their increasing prevalence has also made them an attractive target for attackers seeking to abuse AI capabilities for reconnaissance, social engineering, and automated exploitation.

How the Endpoints Were Exposed

The attacks observed between March and May 2026 capitalized on common misconfigurations. In many cases, organizations deploying Ollama or LiteLLM failed to implement authentication, mistakenly assuming that internal network access was sufficient. Others exposed the services directly to the internet for remote development or third-party integrations without realizing the security implications. Container images and default configurations often prioritize ease of use over security, leaving the APIs wide open.

Attackers used mass internet scanning tools to locate these endpoints. Once identified, they could freely query the APIs—listing available models, sending prompts, and integrating the endpoints into their own toolchains. The hijacked backends provided free access to powerful LLMs that could be used to generate phishing emails, write malware scripts, or, as the recent campaign revealed, orchestrate autonomous penetration tests.

The Autonomous Agent Campaign

While stolen compute for cryptocurrency mining (cryptojacking) has long been a problem with exposed AI hardware, this campaign stood out because of its use of autonomous agents. These agents, often built on frameworks like AutoGPT, BabyAGI, or custom variants, are capable of iteratively pursuing complex goals. By plugging a compromised LLM backend into such frameworks, attackers gained a tireless, AI-driven workforce for offensive operations.

Researchers noted that the agents performed a wide range of tasks:
- Reconnaissance: Scanning the victim's network, enumerating services, and mapping topology.
- Vulnerability assessment: Correlating software versions with known CVEs and validating exploitability.
- Exploitation: Attempting SQL injection, cross-site scripting, and command injection attacks.
- Persistence: Installing web shells, creating user accounts, and setting up command-and-control channels.
- Lateral movement: Using stolen credentials to access other systems within the network.

Because the agents operated on compromised AI infrastructure rather than the attackers' own machines, attribution was harder. The malicious traffic appeared to originate from the victim's own network, blending in with legitimate AI usage. Moreover, the elastic nature of LLM-based agents allowed attackers to scale their operations rapidly, targeting multiple organizations simultaneously.

LiteLLM's Gateway Abused as an Attack Orchestrator

In several observed incidents, LiteLLM proxies were abused not just for their model access but for their gateway functionality. Attackers used the proxy's ability to route requests to multiple LLM providers to optimize their tooling. For example, they could send reconnaissance results to one model for summarization, while another model generated exploit code. The proxy's key management features were also exploited to cycle through stolen API keys, obfuscating the origin of the attacks.

One particularly concerning technique involved prompt injection within the LiteLLM pipeline. By crafting special prompts that mimicked system messages, attackers could manipulate the routing logic, directing certain queries to less secure or less monitored models. This allowed them to bypass content filters and usage limits that organizations had put in place.

Ollama Servers as Stepping Stones

Exposed Ollama instances were primarily used for two purposes: running uncensored versions of models that could produce malicious content normally blocked by commercial APIs, and as a staging ground for local data exfiltration. Because Ollama servers often run inside corporate networks with access to internal resources, compromising them gave attackers a foothold for deeper penetration.

In one documented case, an autonomous agent used an Ollama endpoint to analyze internal documents found on shared drives, extracting credentials, API keys, and intellectual property. The extracted data was then exfiltrated via the same Ollama API, disguised as routine inference traffic.

Why Enterprises Should Be Concerned

The abuse of AI gateways represents a blind spot for many security teams. Traditional network monitoring focuses on north-south traffic and known attack signatures, but AI inference traffic can be voluminous and complex. Malicious prompts and responses often look like normal usage, making anomaly detection difficult. Additionally, the use of autonomous agents means that attacks can proceed at machine speed, often completing full kill chains in minutes rather than hours or days.

Complicating matters is the widespread assumption that AI infrastructure is isolated or low-risk. Many organizations rush to deploy AI capabilities without involving security teams, leading to misconfigurations and overly permissive access. The campaign between March and May 2026 served as a wake-up call: any exposed AI backend is a potential launchpad for sophisticated, self-directed attacks.

Mitigation Strategies

Security experts recommend immediate audits of all AI infrastructure to ensure that API endpoints are not publicly accessible. Specific steps include:

  • Enforce authentication: never expose Ollama, LiteLLM, or similar services without strong authentication. Use API keys, OAuth, or mTLS to restrict access.
  • Network segmentation: place AI servers behind firewalls and VPNs. Avoid binding services to 0.0.0.0 unless absolutely necessary.
  • Monitoring and logging: implement detailed logging for AI API calls, looking for unusual patterns such as repetitive scanning commands, code generation, or requests from unexpected IPs.
  • Regular configuration reviews: ensure default credentials are changed, and disable features like “allow-all” CORS policies.
  • Least privilege: limit the permissions of the LLM service account to only what is needed. Prevent it from accessing sensitive file systems or internal networks.
  • Update and patch: stay current with security patches for all AI tools, as vulnerabilities are frequently discovered.

For LiteLLM specifically, administrators should review all configured virtual keys and remove any that are unused or overly permissive. Enabling rate limiting and request validation can also prevent abuse.

Forward-Looking Analysis

The campaign exploiting Ollama and LiteLLM is unlikely to be an isolated event. As companies embed AI deeper into their operations, the attack surface grows. The next evolution may see attackers weaponizing retrieval-augmented generation (RAG) pipelines to exfiltrate data from knowledge bases, or poisoning model outputs to manipulate business decisions. Defenders must move quickly to treat AI endpoints with the same scrutiny as any other critical infrastructure—because adversaries already do.

With autonomous agents demonstrating their capability to chain together enterprise-grade exploits using stolen AI cycles, the time for assuming AI backends are safe by default is over. Organizations that fail to lock down their Ollama and LiteLLM instances risk becoming the next stepping stone in an automated attack chain.