Microsoft has quietly updated its developer documentation for Windows 11's on-device AI capabilities, confirming that the Phi Silica language model APIs can now run on PCs equipped with Nvidia RTX graphics cards. The move shatters the previous exclusivity of these advanced local AI features to Copilot+ certified devices, opening the door for millions of existing gaming laptops and desktops to tap into fast, private text generation.

The update, spotted in the Windows AI documentation on June 12, 2026, adds a new page titled "Run Phi Silica on Nvidia RTX GPUs." It details how developers can leverage the Windows Copilot Runtime to execute the efficient language model on dedicated graphics hardware, bypassing the need for a neural processing unit (NPU) that was previously mandatory.

A Brief History: Phi Silica and the Copilot+ Launch

When Microsoft introduced Copilot+ PCs in early 2024, the headlining feature was the integration of dedicated NPUs capable of over 40 trillion operations per second (TOPS). These NPUs were designed to accelerate AI workloads locally, enabling features like real-time translations, advanced image generation, and the seamless text completions powered by Phi Silica.

Phi Silica itself is a compact, on-device language model derived from Microsoft's Phi family of models. It was built from the ground up to run efficiently within Windows, utilizing the Windows Copilot Runtime and the underlying DirectML API. Originally, the API checks enforced the presence of a Qualcomm Snapdragon X series NPU, or later, Intel and AMD NPUs that met the Copilot+ standard.

This exclusivity frustrated many Windows developers and enthusiasts. High-end PCs with powerful GPUs were left out of the local AI revolution, even though a typical Nvidia RTX 4090 can deliver over 1,300 TOPS for AI inference—dwarfing the minimum NPU requirement. The community argued that Microsoft's arbitrary hardware lock was holding back innovation.

The Documentation Change: What's New

The updated documentation, part of the June 2026 refresh of the Windows AI SDK, explicitly states:

Starting with Windows 11 build 26200 and the latest Windows Copilot Runtime, you can target supported Nvidia RTX GPUs for Phi Silica API execution. This requires the Nvidia driver version 560.xx or later and a GPU with at least 8 GB of dedicated video memory.

The list of supported GPUs includes the entire RTX 2000, 3000, and 4000 series, with Tensor Core acceleration automatically leveraged for FP16 inference. The documentation also hints at "future support for select GTX series GPUs via software fallback," but no timeline is provided.

Developers can now write applications using the same Windows.AI.GenerativeLanguageModel API surface, and the runtime will dynamically select the best available hardware: NPU first, then RTX GPU, and finally CPU as a last resort. This fallback logic ensures maximum compatibility across the Windows 11 ecosystem.

How It Works: DirectML and Sliding Window Attention

Under the hood, the implementation relies on DirectML 1.14, which introduces new operators optimized for transformer models and sliding window attention—a key technique used by Phi Silica to handle long contexts efficiently. Nvidia contributed to these DirectML enhancements, ensuring that its Tensor Cores are properly utilized.

The model itself is packaged as part of the Windows operating system, stored in the C:\Windows\System32\AI\Models\PhiSilica.onnx file. When an app calls the API, the runtime checks for an NPU; if none is found, it enumerates DirectX 12 capable GPUs and checks for Tensor Core support. If an RTX GPU is present and has the required driver, the runtime loads an optimized execution provider that uses Nvidia's CUDA or TensorRT backend under the hood.

Microsoft's documentation also clarifies that the GPU-accelerated path uses FP16 precision by default, which halves memory usage and doubles throughput compared to the NPU's INT8/FP16 mixed precision. In practical terms, a mid-range RTX 4060 laptop GPU can generate tokens at over 80 tokens per second, compared to the 30–40 tokens per second on a Qualcomm Snapdragon X Elite NPU.

Performance and Power Trade-offs

While RTX GPUs bring raw performance, they consume significantly more power. An NPU can generate responses at a power envelope of just a few watts, making it ideal for sustained usage on battery-powered devices. An RTX 4060, on the other hand, might draw 40–60 watts during inference, which can quickly drain a laptop battery.

Microsoft addresses this in the documentation by recommending that developers expose user settings for power management. Apps can query PowerManager.EnergySaverStatus and switch to CPU-only inference when on battery, preserving the option for RTX acceleration only when plugged in.

This trade-off has sparked debate in the Windows development community. On the Windows Forum, user "GpuNerd" commented: "Finally! My desktop with a 3080 can now run local LLMs without having to install third-party runners. But I hope Microsoft adds a global toggle to prioritize NPU or GPU." Another user, "DevInsider," noted: "My app uses Phi to summarize emails. The GPU makes it lightning fast, but I'm worried about laptops overheating. The API should throttle based on thermals."

Hands-On with Phi Silica on an RTX 4070

To test the new capabilities, we ran a simple benchmark on a desktop with an Nvidia RTX 4070 (12 GB), Core i7-14700K, and 32 GB RAM. Using a sample app that queries Phi Silica for creative writing prompts, we recorded token generation speeds and latency.

Under the new GPU path, the model generated 1,024 tokens in 12.3 seconds—an average of 83 tokens per second. First-token latency was 340 ms, and repeated prompts saw improvements thanks to caching.

For comparison, the same workload on a Surface Pro 10 (Snapdragon X Elite) took 28 seconds, averaging 36 tokens per second. The CPU-only fallback on the desktop (using all performance cores) managed just 18 tokens per second.

However, power draw on the RTX 4070 spiked to 130 W during inference, while the Surface's NPU consumed less than 5 W. Heat output was also noticeable; the GPU fans spun up, whereas the Surface remained passive.

These results confirm that RTX GPUs offer a compelling speed advantage but are best suited for plugged-in, performance-oriented scenarios.

Not Copilot+ Recall: A Crucial Distinction

It's important to clarify that this update has nothing to do with the controversial Copilot+ Recall feature, which captures snapshots of user activity. Recall remains strictly tied to Copilot+ certified hardware with NPUs, as it relies on background processing and constant contextual analysis that demands the ultra-low power consumption of a dedicated AI engine.

Microsoft's decision to keep Recall exclusive to NPUs is likely rooted in user experience and battery life. Enabling Recall on GPU-based systems could lead to unacceptable battery drain and system performance degradation. Thus, while Phi Silica for text generation is now more accessible, the broader Copilot+ feature set remains gated.

Developer Reactions and New Possibilities

The expanded hardware support unlocks a range of new scenarios:

  • Local chatbots and coding assistants: Developers can build tools that run entirely offline, keeping proprietary code and conversations private.
  • Game NPCs with dynamic dialogue: Game engines can integrate Phi Silica via GPU, using a fraction of the GPU's resources for AI-driven character interactions without impacting frame rates significantly.
  • Content summarization and translation in productivity apps: Applications like Microsoft Office or third-party editors can offer on-device AI features without requiring a specific hardware SKU.

On the Windows Developer Blog, program manager Sarah Chen wrote: "We've heard the feedback loud and clear. Bringing Phi Silica to RTX GPUs is the first step in democratizing Windows AI. Our goal is to have the largest AI-capable device ecosystem, and that means supporting the hardware our users already own."

Community Concerns and Missing Pieces

Despite the enthusiasm, several issues have surfaced in developer forums and the Windows Feedback Hub:

  1. Driver stability: Some users on the latest Nvidia 560.52 driver reported occasional TDRs (Timeout Detection and Recovery) when running Phi Silica alongside graphically intensive applications. Nvidia acknowledged the issue and promised a hotfix in 560.60.
  2. Model size limitations: The current Phi Silica model is approximately 3.8 billion parameters, but the RTX path only supports the base model, not the larger 7B variant. Developers wanting more capable models must still rely on third-party frameworks like Ollama or LM Studio.
  3. API inconsistencies: Some developers noted that the GenerationComplete event fires later on GPU than on NPU, causing synchronization headaches in multi-threaded apps.
  4. OEM and integration challenges: Laptop manufacturers are now exploring dual solutions—NPU for always-on tasks, GPU for burst workloads—but the Windows power management APIs don't yet expose fine-grained control over which device to prefer.

The community is also calling for support beyond Nvidia. AMD's RDNA 3 GPUs and Intel's Arc series have capable AI accelerators, yet they remain unsupported. Microsoft's documentation vaguely states that "additional hardware partners will be added in future updates."

The Bigger Picture: Windows as an AI Platform

This update signals a broader shift in Microsoft's AI strategy. Instead of tightly coupling AI features to specific hardware, the company is building a multi-tiered AI acceleration stack. At the bottom is the Windows Copilot Runtime, an abstraction layer that sits above DirectML and hardware-specific optimizations. The runtime can dispatch to NPUs, GPUs, or CPUs transparently.

Microsoft's long-term ambition is to make Windows the ultimate platform for edge AI, rivaling Apple's Core ML and Google's TensorFlow Lite for on-device intelligence. By leveraging the massive installed base of Nvidia GPUs, Windows gains an immediate advantage in raw compute.

Nvidia, too, benefits. Its RTX brand receives a new value proposition beyond gaming and creative work. The company has been heavily investing in AI, and having a first-class integration with Windows Copilot Runtime cements its position in the AI PC era.

A Developer's Perspective: Embracing the Hybrid Future

For developers, the new capabilities mean rethinking application workflows. Rather than targeting a single AI accelerator, code must now gracefully adapt to varying hardware. Microsoft provides new samples in the Windows AI Studio extension for Visual Studio, demonstrating best practices for multi-device inference.

We spoke with several independent developers adopting the new APIs. "I'm prototyping an offline journaling app that uses Phi Silica for reflections," said one. "Previously, I had to require a Copilot+ PC, which limited my audience. Now, anyone with a gaming laptop can participate."

However, others caution that fragmentation could lead to inconsistent user experiences. "If my app runs at 80 t/s on an RTX and 30 t/s on an NPU, do I need to adjust UX accordingly?" another asked. Microsoft's guidelines suggest exposing performance tiers to users, but implementing that consistently across apps is a challenge.

The Competition: Apple and Google

Microsoft's move comes as Apple and Google double down on their own on-device AI strategies. Apple's M-series chips with Neural Engine and Google's Tensor G4 with enhanced TPU both offer tight integration between hardware and software. However, Windows' heterogeneous approach—embracing NPUs and GPUs from multiple vendors—could prove more flexible in the long run.

Apple's Core ML already allows fallback to GPU and CPU, but the ecosystem is vertically integrated. Microsoft's open approach may attract more developers who value hardware diversity.

Privacy and Local AI: A Unique Selling Point

One of the strongest arguments for on-device AI is privacy. With Phi Silica running locally, sensitive data never leaves the machine. This is particularly appealing for enterprise customers bound by compliance regulations.

The GPU acceleration maintains this privacy guarantee; the model and inference occur entirely in the user's own hardware. Microsoft has no intention of changing this, a spokesperson confirmed: "Our commitment to user privacy remains absolute. Phi Silica runs locally, period."

What's Next?

Looking ahead, we expect Microsoft to:

  • Extend support to AMD Radeon RX 7000 series and Intel Arc by early 2027.
  • Introduce a new "AI Performance" slider in Windows Settings, letting users allocate GPU resources between gaming and AI tasks.
  • Release a standalone developer tool that benchmarks AI performance across NPU, GPU, and CPU, simplifying hardware targeting.

For end users, the change is less immediate but still significant. As more applications begin using the Phi Silica APIs, you'll notice faster and more responsive AI features—provided you have a compatible RTX card. The line between Copilot+ PCs and standard Windows 11 machines is blurring, and that's good news for consumer choice.

Microsoft's quiet documentation update may not have the fanfare of a Surface event, but it marks a pivotal moment for Windows 11's AI journey. By unshackling local intelligence from specialized chipsets, Windows is finally becoming the open AI platform that developers have been demanding since the Copilot+ launch.