New Voice Features in ChatGPT
Voice Interaction:
- Conversation: Users can now have voice conversations with ChatGPT. This feature facilitates back-and-forth interaction, making it more intuitive and engaging for users to communicate with the AI. They can speak to the assistant on the go, request bedtime stories, or settle dinner table debates through voice commands.
- Platform Availability: The voice feature is being rolled out to Plus and Enterprise users on iOS and Android platforms. To enable voice conversations, users should navigate to Settings → New Features on the mobile app and opt into voice conversations.
- Voice Selection: Users have the option to choose from five different voices for their assistant. This is facilitated by tapping the headphone button located in the top-right corner of the home screen after opting into voice conversations.
Technology Behind the Voice Feature:
- Text-to-Speech Model: The voice capability is powered by a new text-to-speech model, which can generate human-like audio from text and a few seconds of sample speech. This model was created in collaboration with professional voice actors.
- Speech Recognition: OpenAI has deployed Whisper, its open-source speech recognition system, to transcribe spoken words into text. This ensures accurate understanding and processing of voice commands.
Applications and Future Potential:
- Creative and Accessibility Applications: The new voice technology opens doors to many creative and accessibility-focused applications. It can craft realistic synthetic voices from just a few seconds of real speech, thus broadening the scope of how users can interact with ChatGPT and other platforms.
- Collaborations: One notable collaboration is with Spotify, which is leveraging this technology for a pilot feature called Voice Translation. This feature helps podcasters expand their reach by translating podcasts into additional languages using the podcasters' own voices.
Release Strategy:
OpenAI is deploying the new image and voice capabilities gradually, with a focus on ensuring safety and refining risk mitigations over time. This cautious approach also prepares users for more powerful systems in the future, emphasizing OpenAI’s commitment to building safe and beneficial AGI (Artificial General Intelligence).
The integration of voice and image features in ChatGPT not only enriches the user experience but also paves the way for more interactive and multimedia-oriented AI applications in the near future.
tel: + (44) 7553 857748
info@shadowban.co.uk