Voice messages have become a dominant communication mode: WhatsApp reports over 7 billion voice messages sent daily, and Telegram, iMessage, and most major messaging apps have made audio messages a core feature. The convenience — speaking is faster than typing; voice conveys tone and emotion that text cannot — is clear. The technology behind recording, encoding, storing, and delivering these messages is less visible but genuinely interesting, particularly for anonymous platforms where media must be handled ephemerally.
Recording in the Browser
Browser-based voice recording uses the MediaRecorder API, which accesses the device microphone via getUserMedia (part of the WebRTC family of APIs). The MediaRecorder streams audio in chunks while recording is active, and produces a Blob (binary large object) when recording stops. The default encoding varies by browser: Chrome uses WebM with the Opus codec; Safari uses MP4 with the AAC codec; Firefox uses WebM with Opus. For cross-browser compatibility, platforms typically normalize to a single format on the server side.
Codec Selection: Why Opus Wins
The Opus codec, developed by Xiph.Org and standardized by the IETF in 2012, has become the standard for internet voice applications. Opus consistently outperforms MP3 and AAC at equivalent bitrates for voice content at the 8–32 kbps range typical of voice messages, while also handling music well at higher bitrates. At 16 kbps, Opus produces voice quality comparable to traditional telephone calls with near-invisible compression artifacts. WebRTC mandates Opus for audio, making it the natural choice for browser-based voice note applications.
Ephemeral Voice Note Delivery
For anonymous chat platforms, voice notes present additional privacy considerations compared to text. An audio file is a more durable record of a conversation than text — and it captures vocal characteristics that are potentially identifiable even without explicit identity disclosure. Truly ephemeral voice note delivery uses short-lived signed URLs (expiring within the session window), serves audio from memory rather than persistent storage, and ensures that once the session ends, the audio file is unrecoverable. The platform should never retain audio content to a persistent database, treating voice notes with the same ephemeral architecture applied to text messages.