ChatGPT Enhancements: Voice Conversations and Image Recognition

ChatGPT conversations and image recognition

ChatGPT will be able to engage in voice conversations and recognize objects in images. For instance, ChatGPT is ready to read bedtime stories, assist in creating recipes from photos of available ingredients, and solve math homework problems with a picture of the textbook’s question. Voice input and output enable seamless conversations without the need for typing.

These new features will become available to premium subscribers within the next two weeks. Voice input will be exclusively accessible on iOS and Android, while image recognition will be available across all platforms.

Voice Conversations with ChatGPT

Users can now engage in voice conversations with ChatGPT. A text-to-speech model has been integrated to transcribe user speech, and voice actors have recorded responses to ensure high-quality speech output by ChatGPT.

To enable the voice feature in mobile applications, navigate to “Settings” -> “New Features,” and activate voice conversations. Then, click on the headphones icon in the upper right corner to choose from five different voice options.

Image Discussions

Users can now exchange one or multiple images with ChatGPT. Tasks such as diagnosing technical issues, creating recipes, or analyzing complex graphics and tables can be completed within seconds. An example demonstrates how the chatbot assists in lowering a bicycle seat (full video):

To add an image to the conversation, click the “Photo” button to take a photo or select one from your gallery. You can also highlight a specific area on the image to help ChatGPT focus on that portion and comprehend the context more quickly.

These features have been made possible by the multimodal capabilities embedded in GPT-4 and GPT 3.5.

Limitations

OpenAI will not immediately roll out these new features to all users in order to gather feedback from alpha testers and enhance safety measures before broader deployment. Voice input carries risks such as voice forgery for fraudulent purposes and hallucinations leading to incorrect responses. To prevent voice forgery, the decision has been made not to implement voice cloning functionality.

OpenAI advises verifying information obtained from ChatGPT and avoiding its use in high-risk situations, such as in the field of medicine. Additionally, the model performs less efficiently when dealing with text in non-English languages.

Voice Conversations with ChatGPT

Image Discussions

Limitations

More from Neurohive