UK Trial Reveals Copilot's Productivity Promise Fails to Materialize, While ChatGPT Projects Go Free

A major UK government evaluation of Microsoft Copilot has concluded that, despite high user satisfaction and some time savings on routine tasks, the AI assistant does not deliver clear productivity gains at an organizational level. Released alongside a broader cross-departmental study, the findings inject a dose of reality into the generative AI hype just as OpenAI makes its ChatGPT Projects feature available to free users, and as AI reshapes call centers, payments, and insurance workflows. Together, these developments paint a clear—if messy—picture for small and medium businesses: AI is delivering real value in pockets, but the headline claim of “instant productivity” remains conditional and situational.

The UK Copilot Evaluations: Two Studies, One Complicated Reality

Two official UK government evaluations provide the best public data yet on how a large group of knowledge workers fared when Microsoft Copilot was introduced. The Government Digital Service (GDS) ran a cross-government experiment covering roughly 20,000 employees across 12 organizations. It found an average user-reported time saving of 26 minutes per day, with over 70% of users saying Copilot cut time spent on routine searches and mundane tasks, and 82% saying they would not want to return to pre-Copilot working patterns.

A separate, smaller evaluation by the Department for Business and Trade (DBT) gave the AI a harder look. Its 1,000-license pilot combined diary studies, observed tasks, and an adjusted methodology that penalized novel work created only because Copilot made it possible. The DBT reported high satisfaction—72% of users were satisfied or very satisfied—and some time savings on writing tasks. Yet the evaluators concluded that the evidence did not show improved productivity at the department level. After adjusting for unused outputs and extra verification time, net productivity gains vanished.

These divergent results are not contradictory when you examine the details. Differences in scale, measurement approaches, and task types explain the gap. The GDS relied on self-reported surveys, which can overstate savings. The DBT’s rigorous adjustments subtracted time spent checking AI outputs and discarded outputs that were created but never used. The DBT also found that Copilot sometimes prompted staff to attempt extra work they would not have done otherwise, creating a new category of effort that ate into any time freed up.

Task type drove much of the benefit. Both reports show Copilot helps most with document drafting, summarization, and other clearly bounded text tasks. It is less helpful for highly nuanced policy work or complex analytical tasks. Where work is formulaic or repetitively structured, gains were largest. Human factors also mattered: self-led training raised satisfaction more than formal sessions, and neurodiverse and non-native English speakers saw disproportionate benefits. Adoption depended heavily on whether peers and managers accepted the tool.

For small businesses, the takeaway is clear. Expect task-specific wins, not an instantaneous across-the-board productivity leap. If your work includes routine drafting, standard reporting, or template-driven workflows, AI can deliver measurable time reductions—but only with the right measurement and governance. Design pilots that mirror best practices: use representative sampling, an honest control group, and measurement that accounts for verification and newly enabled work. And remember: user enthusiasm alone does not prove ROI.

OpenAI’s ChatGPT Projects Goes Free: A Strategic Freemium Play

While Microsoft’s Copilot faces scrutiny, OpenAI has expanded access to ChatGPT Projects, making the organizational feature available to free-tier users. Projects act like folder-like groupings of chats, files, and custom instructions, letting teams keep related work together and reduce context switching. The rollout, which began in early September, brings tiered file-upload limits: free users can attach up to 5 files per project; Plus, Go, and Edu users get 25; and Pro, Business, and Enterprise users can upload up to 40 files. Customization options like colors and icons, plus project-only memory controls to limit cross-project data bleed, are also included. The feature is live on the web and Android, with iOS support arriving shortly.

This is a clever freemium nudge. By giving free users a taste of organized collaboration, OpenAI encourages teams to test workflows without paying. For light-use teams, the free tier may suffice; for heavier collaborative or data-rich projects, the value of paid plans rises quickly. However, small businesses must not overlook data governance. Free tiers often have different data handling and review policies. Before uploading sensitive client files, verify whether project data may be used for training or human review under your account terms.

A recommended playbook for SMBs using ChatGPT Projects: start with a non-sensitive test project, such as a marketing calendar or onboarding checklist. Document project membership and data categories, restrict who can upload files, and test what persists with project-only memory. If your team needs stable, auditable data controls, evaluate paid tiers or enterprise contracts. Projects reduce accidental context bleeding but do not eliminate data governance concerns.

AI in Call Centers: Augmentation, Not Replacement

Contact-center AI is no longer theoretical. Providers and large enterprises are using AI for first-pass triage, agent assistance, conversation summarization, and real-time prompts. Bank of America’s virtual assistant Erica, in operation since 2018, handles billions of interactions but still routes unresolved queries to human agents. This hybrid model—automating routine inquiries while preserving human judgment for complex or sensitive issues—is becoming the industry benchmark.

For small businesses, the low-hanging fruit is clear: billing inquiries, balance checks, appointment scheduling, and other repetitive tasks can be automated with inexpensive chatbot frameworks. But the human fallback must be easy. A near-universal complaint about chatbots is “no human available,” which erodes satisfaction. Design flows so customers can escalate without friction. Even in small support teams, AI that surfaces relevant customer history and suggests replies will boost first-contact resolution and speed up onboarding for junior agents. Regulatory and privacy constraints also matter; ensure any vendor provides contractual covenants for data handling, private endpoints, and explicit training-exclusion options if you handle PII or regulated data.

Visa’s Research: Stored Credentials Fix Friction Fast

While AI grabs headlines, operational basics still drive revenue. Visa’s recent research highlights that checkout friction costs sales, especially for small sellers. Around 41% of SMBs reported payment processing errors during recent transactions, and shoppers are more than twice as likely to face payment issues at SMBs than at large retailers. Visa’s recommended solutions: stored payment credentials (card-on-file), biometric authentication, and Buy Now, Pay Later options to reduce friction and encourage repeat purchases.

For commerce SMBs, these are incremental, measurable improvements that often produce faster ROI than speculative AI investments. Implement tokenization and card-on-file capability through a reputable payments partner. Provide preferred-payment options, use robust retry logic with clear error messaging, and partner with a modern PSP that offers PCI-compliant vaulting. Fixing checkout friction today can lift revenue without the complexity of an AI pilot.

Agent CRM’s Data Bridge: Targeted Automation for Insurance

In a sector where compliance and paperwork are heavy, Agent CRM launched Data Bridge, a browser extension that copies client data from its CRM into major Medicare enrollment platforms with two clicks. It supports Sunfire, Connecture, MyMedicareBot, MedicareCENTER, HealthSherpa, and others, and is free to Agent CRM users via the Chrome Web Store. For small insurance agencies, this eliminates manual re-typing during hectic enrollment periods, reducing errors and accelerating submissions. However, any automation that transfers client data must be assessed for security, auditability, and compliance with health-insurance regulations. Test such extensions in a staging environment first.

Recurring Risks: Hallucinations, Privacy, and Operational Drag

Across all these stories, the same risks recur. AI systems can hallucinate or invent facts, which is why the DBT evaluation flagged the need for output verification. Treat AI outputs as drafts and build review steps for anything that affects customers, compliance, or finances. Free tools and consumer tiers often have different data-handling policies; for business use with sensitive client data, prefer enterprise contracts or vendors that guarantee training opt-outs. Automations that create new work—as Copilot sometimes did—can increase load, so monitor for this effect. Financial services and healthcare require stricter oversight; test every integration before rolling into production.

How to Run a High-Signal AI Pilot: A Practical Checklist

A narrow, measurable use case is the foundation. Pick one repeatable task—drafting contract templates, summarizing support tickets, auto-filling enrollment forms—and track time per task, error rate, and rework time over 30–90 days. Use a control group or pre/post baseline whenever possible. Record how long it takes to check and correct AI outputs, and subtract that from claimed savings. Lock down data flows with project isolation, memory controls, and vendor contracts that cover data retention and human review. Train users with practical, self-led exercises, and create a feedback loop for sharing prompts and good practices. Scale only after governance is in place: DLP rules, audit logs, role-based access, and an incident playbook for hallucinations or leaks. And plan role evolution—re-skill staff into oversight, prompt engineering, and higher-value customer interactions.

Short-Term Recommendations for Small Businesses

Pilot, measure, and be conservative about ROI claims. High satisfaction does not automatically equal productivity gains. Use diaries, time logs, and observed tasks to confirm savings.
Use ChatGPT Projects (free tier) as a low-cost team organizer. Don’t move regulated customer data there without contractual guarantees. Evaluate paid tiers if you need stronger data controls.
Combine automated customer-service triage with an always-available human escalation path. Use AI to augment agents, not replace them.
Fix checkout friction before chasing AI miracles. Tokenization and card-on-file often deliver the fastest revenue lift.
Where industry-specific automation exists (like Agent CRM’s Data Bridge), pilot it in a non-critical window. Validate security and compliance first.

The near future will bring improved models and deeper integrations. For the prudent SMB, generative AI is a potent but disciplined tool—not a plug-and-play productivity switch. Real-world benefit depends on deployment design, trust, governance, and the patient integration of AI into existing workflows. The UK evaluations and the wave of practical updates make one thing plain: realistic optimism, backed by rigorous measurement, is the only safe path forward.