In a practical head-to-head test, a Windows enthusiast attempted to swap Microsoft Copilot’s polished web summarization for a fully local AI stack using Ollama and the Page Assist browser extension. The result? Copilot remains the faster, more intuitive tool for everyday browsing, while the local alternative offers tantalizing privacy and customization benefits but falls short on speed and conversational polish. The experiment, detailed in a recent hands-on report, highlights the growing tension between cloud-based AI convenience and the appeal of on-device large language models (LLMs) for Windows users.

For many Windows power users, Copilot’s ability to instantly summarize any webpage—complete with follow-up questions and contextual insights—has become an indispensable part of the browser workflow. But with the rapid maturation of open-source models and local LLM runners like Ollama, the dream of an equally capable, privacy-respecting alternative is closer than ever. The test put that dream to the sword, and the verdict is nuanced: local AI summarization works, but it’s not yet a Copilot replacement for the masses.

The Experiment: Replacing Copilot’s Core Feature with Local AI

The goal was simple: use only local software to replicate Copilot’s most valuable function—generating quick, fluent summaries of the web pages you’re reading. The chosen stack combined three components:

  • Page Assist: An open-source browser extension that adds a sidebar and web UI for interacting with local AI models. It can extract page content, run OCR on images, and send everything to a model provider via a local API.
  • Ollama: The popular tool for running LLMs locally, exposing an OpenAI-compatible endpoint on localhost:11434. Ollama manages model downloads, quantization, and inference.
  • Nomic Embed Text: A retrieval embedding model specifically designed for retrieval-augmented generation (RAG) tasks. When loaded into Ollama, it creates dense vector representations of web page chunks, enabling the local LLM to “see” the full content without holding it all in memory.

On top of this, various local LLMs were tested, including the gpt-oss family from OpenAI—open-weight models built for on-device use. The 20-billion-parameter variant (gpt-oss:20b) proved capable of decent summaries, but required significant GPU resources. Smaller models ran faster but produced less coherent output.

How RAG Powers Local Summarization

The secret sauce behind making a relatively small local model behave like a context-aware assistant is retrieval-augmented generation. The workflow looks like this:

  1. Page Assist extracts the visible text and, if needed, OCR-processed images from the active browser tab.
  2. The content is split into manageable chunks and sent to the Nomic embedding model running inside Ollama.
  3. Each chunk is converted into a vector embedding and stored temporarily.
  4. When the user asks “Summarize this page,” the system retrieves the most relevant chunks and injects them into the LLM’s context window, along with the prompt.
  5. The LLM then generates a summary based on that retrieved context.

This setup is critical because it allows even a 7B or 20B model to “read” a long article without hitting context-length limits. Without RAG, the model would either truncate the page or lose coherence. The test confirmed that with a strong embedding model, the RAG layer worked well—provided the hardware could handle the additional latency.

What the Local Stack Got Right

Despite the ultimate verdict favoring Copilot, the local solution showed several bright spots:

  • Seamless Ollama integration: Page Assist automatically detected the local Ollama instance and made model management—pulling, switching, pausing—accessible straight from the extension’s UI. No terminal fiddling required.
  • Coherent summaries with larger models: When paired with a 20B parameter LLM and the Nomic embedder, the system produced factually accurate, paragraph-length summaries that captured the core arguments of complex articles.
  • Vision and OCR support: Page Assist’s sidebar can handle images and screenshots, using vision-enabled local models to describe visual elements or extract text from graphics. This mirror’s Copilot’s nascent vision features.
  • Complete data sovereignty: Because everything ran locally, not a single byte of page content ever left the machine. For users in regulated industries or those simply paranoid about telemetry, this is a non-negotiable advantage.

These positives demonstrate that the local AI ecosystem is maturing fast. The barriers to entry—once daunting for non-developers—are crumbling thanks to tools like Page Assist. But the gaps remain glaring when you actually try to use it day in, day out.

Where the Local Stack Fell Short

Three show-stopping issues emerged during the test:

1. Context Bleed and Chat State Chaos

The sidebar kept conversation context across page navigations unless the user manually reset it. That meant summarizing a news article after visiting a cooking blog often produced a bizarre mashup of politics and recipes. Page Assist offers a “temporary chat mode” to mitigate this, but it’s not on by default and adds friction to a speed-reading workflow. Copilot, by contrast, intelligently detects page changes and asks if you want to refocus the conversation.

2. Latency: Cloud Speed vs. Local Grind

Even with a powerful consumer GPU, the end-to-end pipeline—extract text, chunk it, generate embeddings, retrieve, and generate the summary—took noticeably longer than Copilot’s near-instant response. When you triage dozens of articles a day, waiting several seconds per page becomes a productivity killer. Microsoft’s server-scale inference and model orchestration are simply hard to beat on consumer hardware.

3. Missing Conversational Polish

Copilot doesn’t just summarize; it suggests follow-up questions, offers alternative angles, and sometimes surfaces counter-arguments. This interactive layer transforms a summary from a static block into a springboard for deeper research. Local models, while factually accurate, tended to produce drier, more matter-of-fact output. They lacked the editorial framing that makes Copilot’s summaries immediately actionable.

Head-to-Head: Why Copilot Still Wins

The test reinforced three concrete advantages that keep Microsoft’s assistant ahead:

Factor Copilot Local Stack (Page Assist + Ollama)
Integration Deeply embedded in Edge; one-click or keyboard shortcut to invoke, page-aware out of the box. Requires extension setup, manual model configuration, and conscious context management.
Speed Near-real-time; leverages Azure AI infra and optimized inference pipelines. Depends on local GPU/RAM; end-to-end latency can be 2-5x slower for the same task.
Conversational Intelligence Suggests follow-ups, rephrasing, and cross-references; feels like a thinking partner. Generates correct but static summaries; no built-in suggestion mechanism.
Privacy Page content sent to Microsoft servers (permission required). Entirely on-device; zero data leakage.
Offline Capability Requires internet connection. Fully functional without connectivity.

For the average user who needs to quickly grasp the essence of an article and decide whether to dive deeper, Copilot’s flow is simply more efficient. The integration with Edge is seamless—you press a button, grant one-time permission, and get a summary with follow-up prompts in seconds. Microsoft’s documentation and independent tests consistently praise this conversational layer as the secret behind Copilot’s productivity boost.

The Local Advantage: Privacy, Customization, and Offline Power

None of this means the local approach is pointless. For specific audiences, it’s not just viable—it’s essential:

  • Privacy absolutists: Journalists handling sensitive sources, lawyers reviewing confidential documents, or anyone bound by GDPR/NDA restrictions can summarize with zero external exposure.
  • Power users and researchers: The ability to fine-tune models on proprietary data, tweak RAG templates, or chain multiple LLMs for different tasks opens possibilities that cloud assistants can’t match.
  • Offline / edge scenarios: Field workers, travelers, or secure facilities without internet can still leverage summarization—a feature Copilot can’t provide.
  • Experimenters and hobbyists: With Ollama and Page Assist, switching between models like gpt-oss:20b or adding a new embedding engine takes minutes and costs nothing, fostering rapid innovation.

The open-source ecosystem moves fast. Nomic’s v2 embedding model already improves retrieval quality, and OpenAI’s gpt-oss releases shrink the capability gap. Page Assist itself is evolving with features like OCR language selection and integrated model pulling from Hugging Face.

Practical Recommendations: Should You Try a Local Summarizer?

If you’re curious about ditching the cloud, here’s a pragmatic checklist:

  1. Start small: Pull nomic-embed-text:latest into Ollama and pair it with a 7B–20B model. Avoid the 120B monsters unless you have a server-class GPU.
  2. Enable temporary chat mode: In Page Assist settings, activate temporary chat so each page gets a fresh context. Learn the keyboard shortcuts to speed things up.
  3. Benchmark latency: Measure end-to-end time for a typical long-form article. If it exceeds 5–7 seconds, consider a smaller model or optimized quantization.
  4. Combine with Copilot: Use Copilot for routine browsing and switch to the local stack when privacy matters or when you need a custom RAG query over a specific document set.
  5. Validate output: Local models can hallucinate. Cross-check critical summaries with the original text or run a second model as a sanity check.
  6. Keep the stack updated: Both Ollama and Page Assist receive frequent updates. Subscribe to their release notes to avoid security pitfalls and performance regressions.

The Road Ahead: Closing the Gap

The experiment is a snapshot of a rapidly changing landscape. Two trends are on a collision course:

  • Microsoft will continue embedding Copilot deeper into Windows and Edge, leveraging cloud scale and product polish to make it the default AI assistant for millions.
  • Local AI toolchains will get smoother, models will get smaller and smarter, and UX friction points (like context management) will be ironed out by frameworks like Page Assist and others.

Within a year, the performance gap may narrow enough that a local stack becomes a genuine replacement for many. Until then, the pragmatic choice is a hybrid workflow that extracts the best from both worlds. You don’t have to pick sides—you can use Copilot when speed and conversational insight matter, and switch to Ollama + Page Assist when privacy or custom logic is non-negotiable.

For Windows users, this is ultimately a vote for more choice and less vendor lock-in. The local AI movement isn’t yet winning on convenience, but it’s winning on control. And for a growing number of professionals, that’s the metric that counts.