ChatGPT expands language and listening abilities

OpenAI, led by Sam Altman, has unveiled a major upgrade to ChatGPT, introducing voice and image capabilities.

This upgrade empowers ChatGPT to hear, see, and speak, enhancing its interactivity and usability.

Sam Altman expressed his excitement about the new features and encouraged users to explore the voice mode and vision capabilities.

The rollout of these features will be available to Plus and Enterprise users in the upcoming two weeks.

Voice functionality will be accessible on iOS and Android (through opt-in settings), while image capabilities will be accessible across all platforms.

The voice feature is powered by a state-of-the-art text-to-speech model, generating highly human-like audio from text input, developed with input from professional voice actors.

Whisper, OpenAI's open-source speech recognition system, is used to transcribe spoken words into text, enhancing the voice capability.

Image understanding is enabled by advanced models like GPT-3.5 and GPT-4, leveraging their language comprehension abilities to interpret various types of images.