Seven months after its first demonstration, OpenAI has now launched a feature that can understand real-time video.
Zhito Finance learned that nearly seven months after the first public demonstration, OpenAI officially launched a new function of ChatGPT-like advanced AI voice assistant for paying users, which is real-time video conversations. In a live stream on Thursday, the company stated that the Advanced Voice Mode, which resembles human conversation, is being realized. Advanced Voice is supported by OpenAI's multimodal model GPT-4o.
OpenAI announced that in the mobile application of ChatGPT, the video and screen sharing features of the Advanced Voice mode have been launched, allowing users who subscribe to ChatGPT Plus, Team or Pro to point their phones at objects and let ChatGPT respond almost in real-time.
OpenAI’s researchers demonstrated the use of the new feature in the live stream, clicking the voice icon next to the ChatGPT chat bar, and then clicking the video icon in the lower left corner to initiate a video conversation. To share the screen, mobile users must click to open a menu with three noisy options and select 'Share Screen'. Advanced Voice can understand content on the device's screen through screen sharing. For example, it can explain various settings menus or provide suggestions for math problems.
OpenAI stated that most subscribers of ChatGPT Plus and Pro packages as well as all Team users will be able to access the new features launched on Thursday through the ChatGPT app within the next few days, and ChatGPT Plus and Pro users in the EU, Swiss Franc, Iceland, Norway, and Liechtenstein are also expected to soon utilize new features. The enterprise and education versions of ChatGPT, Enterprise and Edu, will launch the new features in January next year.
Advanced Voice has been delayed several times, reportedly partly because OpenAI announced the feature before it was product-ready. In April of this year, OpenAI promised that 'Advanced Voice' would be launched to users 'within a few weeks'. Months later, the company stated that it needed more time.
At the end of June, OpenAI introduced the voice mode to a small group of Plus plan users and then announced a month-long delay to ensure that the function could safely and effectively handle requests from millions of users. At that time, OpenAI stated that the plan was to allow all Plus users to access the feature this fall, with the exact timeline depending on whether internal high standards for safety and reliability were met. At the end of July, OpenAI launched ChatGPT under the Advanced Voice mode for a limited number of paying Plus users, stating that the voice mode cannot mimic others' speaking styles and has added new filters to ensure that the software can detect and reject requests to generate music or other forms of copyright-protected audio.
In addition, competitors such as Google (GOOGL.US) and Meta (META.US) are also developing similar features for their respective chatbot products. This week, Google launched the real-time video analysis conversational AI feature Project Astra for a group of "trusted testers" to use.