OpenAI is reportedly testing an unannounced voice model, codenamed GPT-Bidi-1, that could transform ChatGPT into a truly conversational AI—one that listens and speaks at the same time, just like a human. App code references and a handful of user reports point to internal trials planned for late June 2026, signaling a seismic shift in how we interact with voice assistants.
If successful, GPT-Bidi-1 would replace the rigid turn-taking of today’s chatbots with a fluid, full-duplex experience. You could interrupt, ask follow-ups mid-sentence, or even pause to think without confusing the model. For Windows users, it might herald a native voice assistant that finally rivals Cortana’s original ambition—only smarter, faster, and deeply integrated with the OS.
What is GPT-Bidi-1?
GPT-Bidi-1 first surfaced in decompiled code from the ChatGPT app, where strings referencing a "bidi" mode hinted at bidirectional voice processing. The name itself is a portmanteau of “bidirectional,” a term borrowed from networking to describe simultaneous two-way communication. Unlike the current ChatGPT Voice Mode, which operates in half-duplex—meaning only one party can “talk” at a time—Bidi-1 appears designed for overlapping speech.
Early testers, who gained access through what appears to be a limited alpha ring, describe an experience where the AI responds to interruptions gracefully, adjusts its volume when the user speaks over it, and even picks up on verbal cues like “um” or “wait” without breaking flow. These reports remain unverified, but they align with a growing push toward more naturalistic AI interaction.
OpenAI has not publicly acknowledged GPT-Bidi-1. When reached for comment, a spokesperson said the company “doesn’t discuss unreleased products.” Still, the timing makes sense: competitors like Google and Anthropic have teased always-listening assistants, and Microsoft has been quietly rebuilding its voice stack under the Windows AI banner.
Full-Duplex vs. Half-Duplex: Why It Matters
To grasp the leap GPT-Bidi-1 represents, you need to understand the chasm between half-duplex and full-duplex voice systems.
Half-duplex communication is a walkie-talkie. One side transmits, the other listens, and you signal when you’re done with a beep or a long pause. ChatGPT Voice, Alexa, Siri, and even Cortana run on this paradigm. You speak, they process, they respond. Interrupting usually requires a wake word or a tap. The lag between turns can stack up to a second or more, breaking the illusion of conversation.
Full-duplex lets both sides talk and hear simultaneously, like a phone call. This demands real-time audio streaming, echo cancellation, and the AI equivalent of selective hearing: parsing incoming speech while generating output, then deciding on the fly whether to pause, continue, or adjust. For a language model, it’s a non-trivial engineering feat.
GPT-Bidi-1 presumably tackles this with a combination of low-latency transformers and a dedicated voice activity detection layer that separates user speech from its own. The model must track two audio streams—one incoming, one outgoing—and align their semantic content so responses remain coherent even when interrupted mid-word.
The State of ChatGPT Voice Today
ChatGPT Voice, rolled out broadly in late 2023, was a breakthrough in synthetic speech quality and latency. It reduced response times to around 2.5 seconds on average, making conversations feel more fluid than with Alexa or Google Assistant. But it never escaped the half-duplex curse.
Users quickly discovered quirks: the model would stop listening the moment it started speaking, forcing them to wait out long-winded answers. Background noise could trick it into thinking a turn was over. And if you uttered a quick “actually no” while the AI droned on, your correction vanished into the ether—only to be handled as a fresh turn after an awkward silence.
These limitations kept voice mode more a novelty than a productivity tool. For anything beyond casual Q&A, the text interface remained superior simply because typing allowed for seamless back-and-forth editing. GPT-Bidi-1 aims to erase that gap.
How Full-Duplex Could Transform AI Conversations
Imagine sketching out a presentation deck with ChatGPT. You start describing a slide, the AI begins suggesting visuals, and you interrupt: "No, make that a bar chart instead." Mid-sentence, it swaps the suggestion without missing a beat. You pause to think, the AI stays silent. You add a new data point, and it seamlessly updates the chart while you speak.
This kind of fluid collaboration is the holy grail of voice-first computing. It would unlock use cases currently impossible:
- Real-time translation where both speakers talk naturally, with the AI whispering corrections in one ear.
- Live coding assistance where you talk through a bug while the AI suggests fixes inline, adapting as you change your mind.
- Hands-free enterprise workflows where field workers narrate inspections and receive safety alerts without stopping work.
- In-car assistants that can break in with urgent traffic updates while you describe a destination.
For Windows users, Microsoft’s deep partnership with OpenAI makes desktop integration almost inevitable. A full-duplex ChatGPT could become the default voice layer for Windows 12 (or future 11 updates), replacing the defunct Cortana and tying into Microsoft 365, Edge, and the new Windows Copilot sidebar.
Evidence from App References and User Reports
The strongest clues about GPT-Bidi-1 come from teardowns of the ChatGPT Android app version 1.2026.158 (beta). Developers found multiple references to a “voice mode bidi” flag, along with new audio session APIs that hint at duplex streaming. One string reads: “enable_bidi_model”: true; another references “audio_duplex_threshold_ms”: 150.
A handful of users on Reddit and the OpenAI community forums claim they temporarily saw a toggle labeled “Real-time conversation (Experimental)” in their iOS app. Screenshots, since deleted, showed a dialog warning: “This model processes your microphone continuously. Do not enable in sensitive environments.” One user described a 15-minute chat where interruptions worked “eerily well,” but voice quality degraded when both sides spoke at once.
Security researcher Jane Wong, known for reverse-engineering app code, tweeted about the discovery: “ChatGPT v1.2026.158 includes an unreleased voice model called bidi. The audio pipeline suggests simultaneous listen+speak. No launch date yet, but tests in late June.”
These breadcrumbs paint a picture of active internal development, possibly under the “Red Team” phase where model behavior is stress-tested for safety and bias. Late June 2026 aligns with typical OpenAI product cycles, which often debut features in alpha before staged rollouts.
Privacy and Security Implications
An always-listening AI raises a thicket of privacy concerns—especially on Windows, where microphones are often left open for Cortana and dictation. GPT-Bidi-1 would need to process audio locally or stream it to OpenAI’s cloud with near-zero latency, creating a persistent audio pipeline that could capture sensitive conversations.
OpenAI’s privacy policy currently states that voice conversations are not used for training without consent, but the shift to full-duplex blurs the boundary between active and passive listening. Regulators in the EU and UK have already probed always-on assistants for GDPR compliance. Microsoft will face similar scrutiny if it integrates Bidi-1 into Windows.
On the technical side, full-duplex requires sophisticated voice activity detection to filter out background chatter, TV audio, and third-party speakers. If the model accidentally eavesdrops on a meeting and incorporates that data into a response, it could leak confidential information. OpenAI will likely enforce strict segmentation, ensuring the model only listens for the user’s voice when the app is in focus and a clear intent to speak is detected.
Windows might add a hardware-level mute indicator—a glowing LED or on-screen overlay—when the duplex stream is active, similar to the camera indicator. Windows Hello integration could also tie voice profiles to biometric authentication, preventing others from hijacking an active session.
Integration with Windows and the Ghost of Cortana
Microsoft has been quietly rebuilding its voice stack. The Windows Copilot, introduced in 2024, already uses ChatGPT for text-based queries, but voice integration remains limited to dictation. Internal job postings for a “Voice UX Architect” at Microsoft mention “designing for bidirectional conversation models” and “partnering with OpenAI on next-gen audio interfaces.”
A full-duplex assistant would let Windows users control the OS entirely by voice: “Hey Windows, split this window into two and open my presentation on the right.” And while speaking, the AI could interject: “You also have a new Teams message from Sarah. Want me to read it?” All without a wake word after the initial trigger.
This could position Windows as the front-runner for ambient computing, a space where Apple’s Siri has stagnated and Amazon’s Alexa has retreated to smart home niches. With GPT-Bidi-1, Microsoft might finally deliver on the vision it teased with Cortana back in 2014: a proactive, conversational assistant that lives in the OS.
Competitive Landscape
OpenAI isn’t alone in chasing full-duplex AI. Anthropic’s Claude has demonstrated limited simultaneous listening in internal demos, and Google’s latest Assistant has “conversational” modes that reduce latency but still rely on turn-taking. Amazon is reportedly testing “flow” mode for Alexa with real-time interruption handling.
Apple has filed patents for a Siri that can “attend to a user utterance while providing a response,” but no public beta has materialized. Samsung’s Bixby 3.0 promised duplex voice but delivered only faster half-duplex with better wake-word accuracy.
The race to true conversational AI mirrors the early smartphone wars: the first to get it right will set user expectations for a generation. OpenAI’s aggressive timeline suggests it wants to be that first mover, leveraging its existing GPT ecosystem.
Challenges and Skepticism
Despite the promise, full-duplex AI faces steep technical and UX hurdles. Overlapping speech causes audio crosstalk—literal confusion—that even humans struggle with. In a 2025 paper, researchers at MIT showed that ASR error rates spike by 27% when speaker and assistant overlap by more than 300ms. GPT-Bidi-1 would need breakthroughs in source separation or user-specific voice profiles to stay accurate.
There’s also the “rude interruption” problem. An AI that can be cut off whenever you like might encourage bad conversational habits, or conversely, if it never yields, could dominate the interaction. Designers must craft social cues—maybe a gentle chime—that signal when the AI intends to pause or yield the floor.
Battery drain on mobile devices is another concern. Continuous audio streaming and processing could sap phone batteries quickly, even with on-device neural processing. Windows laptops with dedicated AI chips (NPUs) from Qualcomm, Intel, and AMD would handle it better, possibly limiting the feature to Copilot+ PCs initially.
Finally, hallucination risks compound when the model must respond in real time without a backspace. A misheard word might trigger an incorrect command that propagates before the user can correct it. Robust confirmation protocols and easy “undo by voice” will be essential.
What to Expect in Late June 2026
Based on the code references, the late June timeframe likely points to a controlled alpha test—possibly for paid subscribers or developers. OpenAI often uses “research preview” labels for experimental features, gating access behind a waitlist. If the test goes well, a broader rollout to ChatGPT Plus, Team, and Enterprise tiers could follow by fall 2026.
Microsoft’s involvement will be the wildcard. Windows 11’s 2026 update, rumored as “version 24H3,” might ship with an optional “Real-Time Voice” component that offloads the Bidi-1 model to local NPU hardware. This would keep latency ultra-low and address privacy by processing audio on-device.
Until then, Windows users can prepare by updating their ChatGPT apps, ensuring microphone permissions are set correctly, and keeping an eye on the Microsoft Store for a beta version of Copilot Voice. As always, take early reports with a grain of salt—features this ambitious often ship months later than leaks suggest.
The Future of Voice on Windows
GPT-Bidi-1 isn’t just a ChatGPT upgrade; it’s a blueprint for the next decade of human-computer interaction. Full-duplex voice will eventually be as expected as a cursor is today—invisible, immediate, and always on when you need it. For Windows, it’s a chance to shed the baggage of Cortana’s failed promises and lead with something genuinely new.
Skeptics will note that voice interfaces have perennially overpromised. But the underlying AI has never been this capable. With GPT-Bidi-1, we’re on the cusp of assistants that don’t just understand words, but the rhythm of conversation itself. And Microsoft, hand in hand with OpenAI, seems poised to bring that rhythm straight to the Windows desktop.