Leading GenAI Browser Assistants Found Hoovering Up Social Security Numbers, Medical Data Without Consent

A sweeping new audit of the most popular generative AI browser assistants reveals that several widely used extensions silently capture and transmit highly sensitive information from ordinary web sessions—including social security numbers, bank details, and medical records—to remote servers, where it can be tracked, profiled, and reused in ways users never expected or accepted.

Researchers from UC Davis, University College London, and Mediterranea University of Reggio Calabria systematically tested ten top GenAI browser assistants, including ChatGPT integrations, Merlin, Copilot variants, and Perplexity. Their study, titled "Big Help or Big Brother? Auditing Tracking, Profiling, and Personalization in Generative AI Assistants," combined a novel prompting framework with live network traffic analysis to expose exactly what data leaves the browser and where it ends up. The paper, published on arXiv and presented at the ACM Conference on Computer and Communications Security, paints an alarming picture: most assistants transmit far more browsing context than users realize, and several enable cross-site tracking and outright leakage of protected information.

The Audit: How Researchers Caught GenAI Assistants in the Act

The team built a repeatable, two-pronged audit. First, they crafted a prompting framework that simulates realistic user queries and follow-ups, including probes designed to see whether an assistant retained and reused leaked personal attributes. This allowed them to test profiling and personalization behavior across browsing actions. Second, they performed deep network traffic analysis by intercepting and decrypting communications between the browser assistants, their own servers, and third-party trackers while exercising a set of real-world browsing scenarios.

The test persona traversed 20 representative online spaces: ten public (news, shopping, social video) and ten private or authenticated (university health portals, banking pages, tax portals, dating sites, and learning management systems). By asking targeted follow-ups—for instance, "what was the purpose of the current medical visit?" after visiting a health portal—researchers probed whether the assistants had captured and processed sensitive page content. The audit covered architecture (local vs server-side inference), implicit and explicit data collection, sharing with third parties, and profiling/personalization behaviors.

Key Findings: Five Privacy Nightmares Laid Bare

1. Server‑side models dominate; local inference is almost nonexistent

Nearly all tested assistants rely on server-side API calls for inference rather than performing model work locally on the device. That architecture makes it routine for page content or derived context to be transmitted off-device whenever the assistant is invoked. Only one assistant in the study operated without obvious server-side profiling in the researchers’ tests.

2. Full page content and form inputs are being transmitted

Several assistants uploaded full webpage content (HTML DOMs) or large extracts of visible content to their first-party servers, even from authenticated pages. In the most shocking instance, the Merlin extension was caught capturing form input values, including social security numbers entered on a U.S. tax portal and details from banking and health forms. This represents a practical and alarming vector for exfiltration of protected data.

3. Third‑party analytics and cross‑site linkage

Multiple assistants shared user prompts, identifiers (chat IDs, conversation IDs), and even IP addresses with third-party trackers and analytics platforms like Google Analytics and Mixpanel. When an assistant attaches stable identifiers or chat tokens to analytics calls, it creates the technical possibility of cross-site tracking and retargeting that far exceeds a single extension’s telemetry. Assistants named in the study for these behaviors include Sider, TinaMind, and others.

4. Persistent profiling and personalization

Some assistants inferred demographic attributes—age, gender, income, interests—and used those inferred attributes to personalize responses across sessions and browsing contexts. A subset of tested tools preserved context across navigations, enabling profiles to persist even when users moved to new sites or private pages. Perplexity’s assistant stood out as comparatively privacy-friendly; other mainstream integrations showed extensive profiling traces.

5. Private spaces are not reliably protected

Assistants that were expected to limit data collection in private, authenticated spaces sometimes continued recording or sent collected content upstream. The study shows that users’ expectation of privacy while interacting with health portals, university systems, or banking pages can be violated simply by having an assistant active in their browser. These kinds of leaks raise possible compliance issues under sectoral laws.

Why This Matters Legally and Operationally

The findings have immediate legal and operational repercussions. In the U.S., transmission of health and education records without appropriate safeguards could implicate HIPAA and FERPA. The researchers caution that, depending on context and contractual arrangements, those data flows might constitute unlawful disclosures. In the European and UK contexts, profiling without a clear lawful basis, cross‑border transfers, and lack of transparency likely trigger GDPR concerns around data minimization, purpose limitation, and automated profiling.

Operationally, the combination of server‑side inference, third‑party analytics, and persistent identifiers dramatically widens the attack surface. A breach at a vendor or analytics partner could expose vast swathes of browsed content that users assumed remained private. The researchers highlight the poor visibility users have into what happens to browsing data after it leaves the device.

How GenAI Assistants Technically Leak Your Data

Three mechanisms underpin most of the leakage:

Content scripts and DOM access: Browser assistants often inject content scripts into pages. Those scripts have access to the page DOM and visible text. If the extension forwards DOM text or serialized HTML upstream for server processing, everything visible on the page can be exfiltrated.
Background service workers and auto‑invocation: Some assistants use background workers that can auto‑trigger on navigation or search events. Auto‑invocation enables context retention but also means data can be sent without an obvious user action. The audit observed auto‑triggered calls in multiple assistants.
Server‑side vs local inference trade‑off: Running models server‑side simplifies engineering and reduces client resource needs, but requires transmitting user content. Local inference is privacy‑preserving by design, but is still rare among the assistants tested.

Immediate Steps for Windows Users and IT Administrators

The risk is not theoretical. If you run these assistants as browser extensions or enable similar in‑browser AI features, sensitive data may already be leaving your endpoint. Practical, immediate steps include:

Audit installed extensions: Remove or disable any generative AI browser assistant you don’t actively use. For those you keep, check extension permissions and disable “read and change all data on the websites you visit” unless strictly necessary.
Block assistants on high‑risk domains: Treat banking, tax, health portals, student systems, and other authenticated services as sensitive. Disable AI assistants when visiting these sites or run them in a separate browser profile that does not hold extensions.
Enforce strict extension‑permission policies in enterprise environments: Use group policy or endpoint management to restrict installation of unsanctioned extensions and require review for any assistant that needs full‑page access.
Prefer assistants with local or explicit consent models: Where possible, use tools that only operate locally or that explicitly fetch pages server‑side only after explicit user consent for each site and action.
Monitor network traffic and telemetry: For security teams: instrument outbound filtering and inspect calls from extension processes to detect uploads of page DOMs or form posts to unknown endpoints.
Principle of least exposure: Log out of non‑essential accounts when not needed, run sensitive tasks in a hardened browser or VM, and avoid entering sensitive data on a machine where untrusted assistants are installed.

These immediate mitigations align with the researchers’ recommendations and mirror pragmatic IT hygiene for extension management.

How Vendors and Browser Platforms Should Respond

The audit provides a blueprint for safer design. Prioritized engineering and policy changes include:

Privacy‑by‑Design: Move privacy‑sensitive features to local processing where feasible; offer a clear “local‑only” mode as default for sensitive actions.
Explicit, machine‑readable disclosures: When a feature will transmit page content, show a one‑click banner that explains exactly what will be sent, where it will be stored, and for how long. Make consent granular and revocable.
Opt‑in profiling & deletion controls: Profiling should be opt‑in, paired with an easy deletion API that erases profiles and associated logs on demand.
Segregate analytics from content flows: Avoid wiring raw prompts, chat IDs, or page content into general analytics pipelines that enable cross‑site tracking. Use anonymized, aggregated telemetry if analytics are necessary.
Independent audits and certifications: Commission third‑party audits, publish methodologies and logs (redacted as necessary), and subject assistants to recognized privacy certifications to rebuild public trust.

These are technically achievable changes that materially reduce regulatory and reputational risk while preserving product value.

Broader Threats: Prompt Injection and Agent Hijacking

The privacy risk of browser assistants sits beside a parallel security problem: agents with broad data access are novel attack surfaces for prompt‑injection and agent‑hijacking exploits. Recent security research has demonstrated “zero‑click” exploit chains that can subvert agents, extract secrets, and implant persistent malicious instructions without direct user interaction. The combination of privileged extension access plus server‑side processing magnifies that threat. Security teams should treat assistants as first‑class attack surfaces and apply the same threat model used for connectors, bots, and API‑enabled services.

Critical Analysis: Convenience versus Privacy

Users flock to GenAI browser assistants for good reasons: productivity gains from summaries, cross‑tab reasoning, and on‑page Q&A speed up research; for users with visual or cognitive impairments, instant summaries can be transformative; and low‑friction side‑panel assistants reduce context switching. But the design trade‑offs that favor convenience over privacy are systemic. Server‑side processing simplifies development but concentrates sensitive data in vendor clouds and analytics stacks. Opaque consent models bury the real implications in legal text, so most users never understand that an assistant may capture authenticated page content.

It is important to note that the audit was performed in controlled lab conditions that simulated realistic browsing—excellent for reproducibility—but real‑world variance (different extension settings, versions, and server‑side configurations) might alter precise behavior. Vendors may point to configuration options or enterprise settings that mitigate observed behavior; such claims should be evaluated against telemetry and code audits. The study shows data transmissions and plausible policy/legal exposure, but regulatory findings require formal enforcement actions. The audit’s legal claims should therefore be understood as substantiated concerns requiring regulator and vendor follow‑up, not as final legal judgments.

Practical Recommendations for Windows Users

Disable AI browser assistants before visiting any medical, banking, tax, or education portals.
Use a second browser profile or a dedicated “research” browser for any assistant workflows; keep your primary profile minimal and extension‑free.
Check extension permissions and audit background processes; remove assistants that require “read all data” permission unless you explicitly consent to the trade‑off.
Prefer vendors that offer explicit per‑site consent or local‑model options. Perplexity showed the least evidence of profiling behavior in this audit, but vendors change quickly; prioritize architectural guarantees (local inference, per‑site consent) over vendor reputations alone.
For enterprise IT: implement extension whitelisting, monitor outbound traffic from extension subprocesses, and include generative‑AI assistants in threat modelling and incident response plans.

The Road Ahead: Regulation, Transparency, and Engineering

This audit should be a clarifying moment for the industry. The convenience of integrated, context‑aware AI is real and compelling, but the deployment model must change if the technology is to scale without undermining user privacy and legal obligations. Regulators in the EU, UK, and U.S. are increasingly scrutinizing AI systems and data processing practices. The researchers explicitly call for regulatory oversight and stronger vendor accountability; mainstream coverage and security research have amplified that call. Policymakers and platform owners should require clearer disclosures, enforceable consent mechanisms, and technical controls that minimize the data surface sent to third parties.

From an engineering perspective, the priorities are straightforward: shift sensitive processing to the client where possible, adopt machine‑readable, user‑facing consent flows, decouple analytics pipelines from raw prompt and content flows, and provide programmatic data‑deletion endpoints and audit logs for users to exercise their rights.

Generative‑AI browser assistants deliver real productivity value, but the current dominant architectures create a predictable and preventable privacy problem: assistants routinely have the technical ability to see everything a user does in a tab, and many suppliers forward that content—sometimes including form inputs from authenticated pages—to remote servers and analytics pipelines. For Windows users and administrators, the responsible posture is immediate and precautionary: audit and limit extension use, treat assistants as potentially privileged software with the same operational scrutiny as any connector or enterprise bot, and press vendors for transparent, privacy‑first designs. Privacy‑by‑design, visible consent, and independent audits are no longer optional if users’ medical, financial, and educational data are to remain private in the age of in‑browser AI.