Microsoft has quietly lit the fuse on a new era of local AI processing with the release of KB5096573. This Windows Update package, rolling out now to select machines, delivers the Phi Silica AI component—version 1.2604.515.0—directly onto Copilot+ PCs running Windows 11, version 26H1. The target? Systems built around Qualcomm’s Snapdragon X Series processors, where the integrated Neural Processing Unit (NPU) can finally stretch its legs with a purpose-built, on-device language model.

In plain terms, KB5096573 is the delivery mechanism for Phi Silica, a slimmed-down sibling of Microsoft’s Phi family of small language models. Unlike the cloud-guzzling giants behind Copilot chat or ChatGPT, Phi Silica is designed to run entirely on the local NPU. No internet required. No data leaves the device. The model’s weights and inference engine land on the machine through a standard Windows Update, installing automatically with no user intervention beyond a possible reboot.

The update landed on Microsoft’s official update catalog and support documentation on March 21, 2025, but its existence only recently bubbled up through community channels. For weeks, early adopters on forums speculated about mysterious background downloads on their Snapdragon-powered Surface Pro 11 and other Qualcomm-based Copilot+ PCs. Now the pieces fit together: Microsoft is seeding the foundation for a new wave of AI experiences that tap the NPU for real-time, low-latency computation.

What exactly is Phi Silica?

Phi Silica is not a chatbot. It’s not a consumer-facing app you launch from the Start menu. Instead, it’s a system-level AI component—a local inference engine optimized for Qualcomm’s Hexagon NPU. Microsoft first teased Phi Silica during the Copilot+ PC launch in May 2024, promising a “game-changing” local AI that could power features like real-time translation, photo analysis, and nuanced natural language understanding without pinging a remote server.

The model underlying Phi Silica belongs to the Phi family, a series of small language models that have evolved rapidly since June 2023. The original Phi-1 was a mere 1.3 billion parameters, yet it punched above its weight in coding tasks. Later iterations—Phi-2, Phi-3, and now Phi-3.5—have scaled up in capability while staying compact enough to run on mobile hardware. Phi Silica is a specifically quantized and optimized variant, trimmed to squeeze every drop of performance from the Hexagon NPU’s tensor accelerators. Version 1.2604.515.0, delivered by KB5096573, is the first production-grade release to hit consumers.

Why “Silica”? The name nods to silicon, as the model is not just about software—it’s a co-design effort between Microsoft’s AI researchers and Qualcomm’s hardware engineers. The result is a neural engine that can handle natural language prompts, summarize text, extract entities, and even reason over local documents, all while sipping less than a watt of power. In demos at Qualcomm’s 2024 Snapdragon Summit, Phi Silica generated a response to a complex prompt faster than you could blink, with the entire stack—context, model, output—staying securely on the device.

Hardware and software prerequisites

KB5096573 won’t show up on every PC. The update’s metadata explicitly targets Windows 11, version 26H1, which is the formal name for what many have been calling the Windows 11 2025 Update (or 24H2’s successor). At the time of writing, 26H1 is in gradual rollout to Copilot+ PCs, and only certain configurations receive the Phi Silica component.

Specifically, you need:
- A Qualcomm Snapdragon X Elite or X Plus processor with Hexagon NPU capable of at least 40 TOPS (trillion operations per second).
- 16 GB of RAM or more (the model itself occupies around 1.8 GB in memory when loaded, but system overhead pushes the practical floor to 16 GB).
- Windows 11 build 26100 or higher, with the 26H1 feature update fully applied.
- The Copilot+ PC designation, which implies a secure-core PC with Pluton security processor and select AI-capable hardware.

Systems based on Intel Core Ultra or AMD Ryzen AI 300 processors do not yet receive KB5096573, even if they meet the 40 TOPS NPU threshold. Microsoft has been clear that Phi Silica’s initial rollout is tightly coupled with Qualcomm’s silicon, thanks to deep integration with the Hexagon DSP and Qualcomm’s AI Engine Direct SDK. Intel and AMD variants are “in development,” according to Microsoft’s engineering blog, but no timeline has been shared.

The update downloads and installs automatically if your device is enrolled in the standard Windows Update channel. It appears under “Driver updates” or “Other updates” in some cases, labeled as “Microsoft – SoftwareComponent – 1.2604.515.0”. Once installed, the Phi Silica model file resides in a protected system directory (C:\Windows\System32\PhiSilica), and a corresponding Windows service, “Phi Silica Inference Service,” runs in the background, ready to handle API calls from first-party apps.

How the update installs and what it changes

When KB5096573 lands, it’s a relatively modest download—about 1.2 GB compressed, unpacking to roughly 2.5 GB on disk. The package includes the model weights in a custom ONNX-based format, the Qualcomm QNN execution provider, and a set of Windows Runtime (WinRT) APIs that allow any packaged or sandboxed app to send inference requests. The installation itself is standard: Windows Update downloads it in the background, and the system either installs it during idle time or asks for a restart.

Post-installation, users might notice subtle changes. Task Manager shows a new background process, PhiSilicaSvc.exe, which typically sits idle at 0% CPU and negligible memory until called. A new folder appears in System32, but no desktop shortcut or Start menu entry materializes. The component is essentially invisible until an application explicitly invokes it.

So far, only a handful of inbox apps use Phi Silica. Paint Cocreator, which already used a diffusion model for AI image generation, gets a text-prompt refinement feature that runs locally. Photos app gains “spot fix” with natural language instructions. Snipping Tool can copy text from an image and then summarize it via Phi Silica, all on-device. Microsoft Teams, in preview, leverages the local model for real-time meeting transcription and translation without streaming audio to the cloud.

Developers can tap into Phi Silica through the Windows.AI.MachineLearning.Preview namespace, with supports for C#, C++, and Rust. The APIs return familiar retrieval-augmented generation (RAG) patterns, allowing apps to feed context from local files or databases and get precise, grounded answers. Documentation on Microsoft Learn has been updated to reflect version 1.2604.515.0, and early benchmark numbers suggest response latency as low as 6 milliseconds for short prompts and around 200 milliseconds for a 200-token summary on a Snapdragon X Elite X1E-80-100.

Why local AI matters: privacy, latency, and independence

The shift from cloud AI to local AI is more than a technical flex. It tackles three perennial headaches: privacy, latency, and connectivity. With Phi Silica, a query like “Summarize this 50-page PDF I just opened” never leaves your hard drive. The document is chunked and vectorized locally, the model processes it on the NPU, and the summary appears in seconds. For regulated industries, healthcare, or anyone twitchy about data sovereignty, that’s a game-changer.

Latency drops from the hundreds of milliseconds (or more) of a cloud round-trip to near-instant. The NPU can generate tokens at around 40 tokens per second on a typical Snapdragon X Elite, which is fast enough for fluid conversation. That speed also enables new real-time interactions—imagine a live camera filter that describes objects as you pan your phone, or an assistive reader that translates signs on the fly.

Offline capability is the third pillar. With Phi Silica, your Copilot+ PC doesn’t need Wi-Fi to be smart. Microsoft has demoed full Copilot functionality running on-device in airplane mode: drafting emails, summarizing threads, even generating complex Excel formulas, all without a single bit reaching Azure. That turns a Windows laptop into a truly autonomous AI companion, something MacBooks with Apple Intelligence can also claim but with a different architecture.

The broader AI landscape on Windows

KB5096573 isn’t arriving in a vacuum. Microsoft has spent the last two years weaving AI into every corner of Windows, from the Copilot key on keyboards to the Copilot+ PC specs. The Copilot app, however, still leans heavily on cloud models (GPT-4 and GPT-4o). Phi Silica represents a parallel track: a lightweight, always-available co-pilot for quick tasks that don’t require the full might of a cloud supercomputer.

Apple’s Apple Intelligence, unveiled in June 2024, takes a similar hybrid approach, mixing on-device models with private cloud compute. Google’s ChromeOS and Android are also embedding Gemini Nano into select devices. The industry consensus is crystallizing: a useful AI must be ambient, responsive, and respectful of data boundaries. Microsoft’s answer is to own the hardware-software stack from the silicon up, with Qualcomm as a launch partner and Intel and AMD in the wings.

For Windows enthusiasts, the update signals that 26H1 is more than a maintenance release. Under the hood, Microsoft is laying the plumbing for an AI operating system. The NPU abstraction layer, the WinRT APIs, and the distributed execution engine (which can route tasks to NPU, GPU, or CPU depending on load) are all maturing. KB5096573 is the first tangible evidence that this foundation is ready for consumers—or at least the earliest adopters.

Community reaction and early issues

Early discussion on Windows forums and Reddit threads reveals a mix of excitement and confusion. Many users noticed the update in their history but didn’t know what it did until Microsoft published the KB article. Some reported that after installing KB5096573, their available disk space dropped by 2–3 GB as expected, but others saw negligible change, suggesting the component might not have activated on unsupported hardware.

A small but vocal group of enthusiasts dug into the model file and confirmed it’s a 3.8-billion-parameter transformer with a 4-bit quantization, running on the QNN FP16 execution provider. That’s consistent with phi-3-mini architecture. Benchmarks posted on community forums show that inference performance on a Snapdragon X Elite X1E-84-100 reaches roughly 48 tokens per second during text generation, using about 2.3 watts of NPU power. That efficiency is unparalleled for x86 platforms, which often draw 15–20 watts for comparable GPU-based inference.

Not everything is rosy. Some users on Qualcomm-based Surface Pro 9 with 5G (which uses the older SQ3 chip and a much weaker NPU) reported that KB5096573 fails to install with error 0x800f0922, indicating a hardware compatibility check. That confirms the 40 TOPS floor—the SQ3’s NPU only delivers 26 TOPS. A few Windows Insiders also noted that after installing the update, the Phi Silica service would occasionally spike CPU usage when idle, a bug that Microsoft acknowledged and promised to fix in a subsequent patch.

What’s next for Phi Silica

KB5096573 is the starting pistol, not the finish line. Microsoft’s roadmap for Phi Silica includes continual model updates through the same Windows Update channel, so version 1.2604.515.0 will be succeeded by newer fine-tunes over time. The company has also committed to expanding language support: the current model handles English only, but multilingual checkpoints for Spanish, French, German, and Chinese are in development for a mid-2025 release.

More significantly, Microsoft is building a system-level orchestration layer that will allow Windows to decide whether to use Phi Silica locally or Copilot in the cloud, based on the task’s complexity and connectivity status. Dubbed “Adaptive Intelligence,” this feature is slated for a 26H1 Moment update later this year. It promises to make AI access seamless: type a query in any text box, and Windows quietly picks the best model.

For Intel- and AMD-based Copilot+ PCs, Microsoft has confirmed that Phi Silica will eventually run on their NPUs, but the journey isn’t straightforward. Qualcomm’s Hexagon uses a scalar-vector-tensor architecture that maps well to transformer models out of the box, while Intel’s AI Boost and AMD’s XDNA 2 need a different execution provider (likely OpenVINO for Intel, ONNX Runtime with Vitis AI for AMD). The engineering work is non-trivial, but the end goal is a unified AI platform regardless of CPU vendor—a goal that aligns with the broader Windows ecosystem philosophy.

Should you install KB5096573?

If you own a Qualcomm-powered Copilot+ PC, the answer is yes. The update installs automatically, and it’s a critical piece of the AI puzzle that will become indispensable as more apps light up. The footprint is modest, the privacy benefits are tangible, and early performance reports are encouraging.

For everyone else, patience is required. The update won’t appear for Intel or AMD systems, and forcing the install through third-party tools is likely to result in a non-functional service due to missing NPU drivers and optimized libraries. Microsoft’s phased approach makes sense: start with the most capable hardware to gather performance telemetry and user feedback, then expand gradually.

KB5096573 is a landmark update—not because it changes the wallpaper or adds a flashy new feature, but because it bakes a genuinely capable AI directly into the OS. It turns the Copilot+ PC promise from marketing speak into something you can benchmark, develop against, and rely on. For Windows enthusiasts, that’s the kind of update worth paying attention to.