Voice gets sharper with gpt-realtime-2 (v0.5.2)

Upgraded the voice pipeline to OpenAI's gpt-realtime-2 model. The thing that already felt fast feels faster, and the transcription gets things right more often.

May 11, 2026

Woman talking on the phone

Short post for a worthwhile upgrade.

The voice input has been running on the first generation of the OpenAI Realtime API since December. It worked. It was sometimes a little slow to start, and it occasionally fumbled food words like “kefir” or “halloumi” that are not in any normal conversation transcript.

The new gpt-realtime-2 model fixes both of those. Time to first transcript is noticeably shorter. The text that comes back is more accurate on food vocabulary, which matters more than it sounds like it would, because if the transcript misspells what you ate, the AI does not know what to look up.

The mic button thinking animation also got a polish pass. The little circle around the mic is smoother now and behaves more like you would expect when you are mid-sentence and pausing for a beat. Subtle, but if you log meals daily you will see the difference.

Nothing else changes for you. Same flow, same nutrition card, same accept or edit. It is just better at the part where it listens.

by Boyd Thomson