In a head-to-head comparison of Microsoft 365 Copilot versus native Outlook search, the AI assistant proved remarkably adept at surfacing emails and contacts—but stumbled when asked to retrieve and organize tasks, according to a new hands-on evaluation by Cambridge Network. The findings underscore both the rapid advances in AI-assisted productivity and the persistent gaps that leave everyday users frustrated.
Conducted by a seasoned Microsoft trainer and productivity consultant, the test pitted Copilot’s natural language queries against traditional keyword searches and manual task management in Outlook across four common scenarios: finding specific emails, locating calendar events, identifying recent contacts, and managing to-do items. What emerged was a clear pattern: Copilot delivered speed and precision for unstructured data like emails, but its performance degraded sharply when faced with the structured yet often neglected world of Outlook tasks.
The Cambridge Network Evaluation: A Real-World Test
The evaluation, detailed in an article published on the Cambridge Network website earlier this month, was designed to mimic typical knowledge-worker workflows. The trainer used a live Microsoft 365 environment populated with years of accumulated emails, meetings, contacts, and tasks—the kind of digital clutter that accumulates in any busy professional’s account. For each scenario, the trainer first attempted to accomplish the goal using standard Outlook features (search bar, filters, task pane) and then repeated the same tasks using Copilot’s conversational interface, accessible via the Copilot pane in Outlook for Microsoft 365.
The scenarios were:
- Email Search: “Find the last email from Jane about the Q3 budget revision.”
- Calendar Lookup: “When is my next meeting with the marketing team?”
- Contact Retrieval: “Who are the people I’ve been in touch with most in the last month?”
- Task Management: “What are my overdue tasks, and can you group them by project?”
In each case, the trainer measured time to completion, accuracy, and the number of steps required. Copilot’s responses were graded on relevance and whether they fully answered the query without manual intervention.
Email Search: Copilot’s Killer Feature
When it came to finding emails, Copilot shined. Traditional Outlook search, while robust, often requires users to remember exact phrases, sender names, or subject lines—and even then, results can be hit-or-miss. Copilot’s semantic search capabilities, powered by the Microsoft Graph and large language models, allowed it to understand context and intent.
For the query about the Q3 budget revision, Copilot returned the correct email thread in under three seconds, complete with a summary of the most relevant message. The trainer noted that doing the same with standard search required refining the search terms twice and scrolling through multiple results. Copilot also surfaced related emails—the initial request, a follow-up with attachments, and the final approval—all in a neatly organized narrative.
This aligns with Microsoft’s positioning of Copilot as a “reasoning engine” that can traverse the web of relationships in your Microsoft 365 data. The Graph already maps users, files, and emails, and Copilot applies natural language understanding to pinpoint what you need. Early beta testers and enterprise customers have reported similar efficiency gains, with some citing a 30-50% reduction in time spent searching for information.
But the Cambridge test highlighted a nuance: Copilot was most effective when queries were specific. Ambiguous prompts like “find emails about budgets” still produced results, but the lack of specificity sometimes returned a mix of relevant and irrelevant messages. The trainer concluded that while Copilot lowers the skill bar for search, it still rewards those who phrase queries precisely—a skill that may take some practice.
Calendar and Contacts: Solid but Not Flawless
For calendar searches, Copilot performed well but showed some quirks. The query “When is my next meeting with the marketing team?” correctly identified a Teams meeting scheduled for the following Tuesday. However, when asked “What meetings do I have this week about the product launch?” Copilot listed three meetings but missed one because its invite used a different naming convention (it was titled “Project X Sync” instead of explicitly mentioning the launch). This reveals a current limitation: Copilot relies on textual content and can miss meetings that don’t contain the keywords, even if they are contextually related.
In contrast, the native Outlook calendar search found all four meetings because the trainer manually searched for “product launch” in the meeting descriptions. This suggests that for highly structured data with explicit metadata, keyword search still has an edge. Copilot’s strength lies in its ability to understand natural language, but it doesn’t yet fully grasp organizational context or implicit associations.
When it came to contact retrieval, Copilot excelled. “Who have I been in touch with most in the last month?” prompted Copilot to scan sent and received emails, as well as Teams chats (where integrated), and produce a ranked list. It even provided a brief summary of interactions—for example, “You’ve exchanged 15 emails with Anika about the onboarding project.” The trainer found this feature particularly useful for rekindling professional relationships or preparing for meetings. Standard Outlook has no equivalent without manually sifting through the People hub or email history, making Copilot a clear winner here.
Task Management: The AI’s Blind Spot
If emails and contacts were Copilot’s grand entrance, task management was its stumble. The trainer asked Copilot: “What are my overdue tasks, and can you group them by project?” The AI’s response was, in the trainer’s words, “underwhelming and often incomplete.” Copilot failed to recognize several overdue tasks that were clearly visible in the Outlook Tasks pane. It also struggled to group tasks by project, instead offering an unorganized list with some items missing entirely.
Further prompts revealed deeper problems. When asked “Create a new task to review the contract by Friday,” Copilot dutifully created a task—but with no due date and no association with the relevant email thread. The trainer had to manually set the deadline and link it to the original request. In another test, “Show me tasks related to the customer support overhaul,” Copilot returned zero results, even though the trainer had multiple tasks with the word “support” in the subject line. The search wasn’t simply failing; it appeared to be ignoring the task database almost entirely.
This echoes a broader frustration in the Microsoft 365 community. On forums, users have noted that Copilot’s integration with Outlook tasks and Microsoft To Do is, at best, surface-level. While Copilot can technically access the Microsoft Graph to read tasks, its ability to comprehend and manipulate them is inconsistent. Some users report success with simple commands like “list my tasks,” but any attempt at filtering, sorting, or grouping often falls flat.
Why Copilot Struggles with Tasks
The root cause likely lies in how tasks are stored and prioritized in Microsoft’s ecosystem. Unlike emails, which are rich in conversational context, tasks are often short, structured items devoid of the semantic cues that large language models thrive on. A task subject like “Call vendor” contains little inherent meaning beyond the words themselves. There’s no thread, no dialogue, and often no relationship to other data points unless the user manually links them.
Additionally, Microsoft’s task landscape is fragmented. Outlook tasks, Microsoft To Do, Planner, and even Loop task lists all exist under the Microsoft 365 umbrella, but they don’t always synchronize seamlessly. Copilot may not have a unified view across these endpoints, leading to missed or duplicate entries. During the Cambridge test, the trainer noted that some tasks were created in Outlook but synced to To Do, while others remained local to Outlook. Copilot seemed to query only one set, resulting in incomplete results.
Another factor is the training data. LLMs are typically trained on vast corpora of web text, where task-oriented language is sparse compared to narrative content. While Microsoft has fine-tuned Copilot for productivity scenarios, the underlying model may inherently be less adept at understanding command-like, imperative phrasing. When you ask for “overdue tasks grouped by project,” you’re expecting a level of executive function—planning, categorization, prioritization—that current AI still finds challenging.
User Reactions and Real-World Implications
The Cambridge Network article has sparked discussion among IT professionals and productivity enthusiasts. Many echoed the trainer’s findings, sharing anecdotes of Copilot’s brilliance in email summarization but expressing disappointment in its task handling. “I love using Copilot to draft replies or find that lost attachment, but I’ve given up on it for tasks. I just open To Do manually,” wrote one user on a popular Microsoft 365 subreddit.
Others see the glass half-full. “It’s a version 1.0 product. Email and calendar integration is already saving me hours each week. I’ll wait for tasks to improve,” commented another. This pragmatic view reflects Microsoft’s iterative approach: Copilot is updated monthly, and task management enhancements are widely expected.
Yet for organizations banking on AI to streamline operations, the task gap is more than a minor annoyance. Tasks are the granular units of execution that drive projects forward. If Copilot can’t reliably surface them, it undermines the promise of an intelligent personal assistant that stays on top of your commitments. Some companies may delay Copilot deployments until the feature set matures.
Microsoft’s Roadmap and the Task Problem
Microsoft has acknowledged that Copilot’s integration with tasks is a work in progress. In a recent Microsoft 365 roadmap update, the company hinted at “enhanced Copilot experiences across Planner, To Do, and Outlook tasks” slated for later this year. The goal, according to Microsoft, is to “unify task management under a single AI-powered experience where Copilot can intelligently create, assign, and track tasks regardless of the endpoint.”
Analysts see this as a necessary move. “The current fragmentation of tasks in Microsoft 365 is a historical artefact,” says Dana Riddle, a productivity analyst at Enterprise Strategy Group. “It’s a tall order for Copilot to navigate that maze. Once Microsoft rationalizes the backend, AI-powered task management could become a killer feature.”
In the interim, power users are developing workarounds. Some use Power Automate to sync tasks across apps before querying Copilot; others structure task subjects with consistent keywords and categories to improve searchability. These manual optimizations, while helpful, underscore how far Copilot has to go before it can handle tasks as seamlessly as it handles email.
Conclusions: A Sharper Tool with a Dull Edge
The Cambridge Network evaluation paints a balanced picture of Microsoft 365 Copilot: it is a game-changing tool for navigating the chaotic sea of emails and contacts, but it still fails to tame the structured world of tasks. For busy professionals, this means Copilot can dramatically reduce email overload while providing limited help with personal organization. The best strategy is to lean on Copilot for what it does well and maintain a parallel, manual system for task tracking—at least for now.
Copilot’s journey mirrors the broader AI wave: extraordinary in some dimensions, bafflingly inept in others. As Microsoft continues to refine its models and unify its task platforms, the line between assistant and extension will continue to blur. Until then, users should temper their expectations when asking their AI colleague, “What’s next on my to-do list?”