OpenAI, led by Sam Altman, has unveiled a major upgrade to ChatGPT, introducing voice and image capabilities.
Voice functionality will be accessible on iOS and Android (through opt-in settings), while image capabilities will be accessible across all platforms.
The voice feature is powered by a state-of-the-art text-to-speech model, generating highly human-like audio from text input, developed with input from professional voice actors.
Whisper, OpenAI's open-source speech recognition system, is used to transcribe spoken words into text, enhancing the voice capability.
Image understanding is enabled by advanced models like GPT-3.5 and GPT-4, leveraging their language comprehension abilities to interpret various types of images.