OpenAI just debuted GPT-4o, a new kind of AI model which you can communicate with in real time via live voice conversation, video streams from your phone, and text. The model is rolling out over the next few weeks and will be free for all users via both the GPT app and the web interface, according to the company. Users that subscribe to OpenAI’s paid tiers, which start at $20 per month, will be able to make more requests.
OpenAI CTO Mira Murati led the live demonstration of the new release one day before Google is expected to unveil its own AI advancements at its flagship I/O conference on Tuesday.
GPT-4 offered similar capabilities, giving users multiple ways to interact with OpenAI’s AI offerings. But it siloed them across separate models, leading to longer response times and presumably higher computing costs. GPT-4o has now merged those capabilities into a single model, which Murati called an “omnimodel.” Doing so means faster responses and smoother transitions between tasks, she said.
The result is a conversational assistant much in the vein of Siri or Alexa, but capable of fielding much more complex prompts including video interactions, based on the company’s demonstration.
“We’re looking at the future of interaction between ourselves and the machines,” Murati said of the demo. “We think that GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural.”
Barret Zoph and Mark Chen, both researchers at OpenAI, walked through a number of applications for the new model. Most impressive was its ability for live conversation. You could now interrupt the model during its responses, and it would stop, listen and adjust course.
OpenAI showed off the ability to change the model’s tone too. Chen asked the model to read a bedtime story “about robots and love,” before quickly demanding a more dramatic voice from the model. The model got progressively more theatrical until Murati demanded it pivot quickly to a convincing robot voice instead (which it excelled at). While there were predictably some short pauses during the conversation while the model reasoned through what to say next, it stood out as a remarkably naturally-paced AI conversation.
The model can reason through vision in real time as well. Using his phone, Zoph filmed himself writing an algebra equation (3x + 1 = 4) on a sheet of paper, having GPT-4o follow along. He instructed it to not provide answers, but instead guide him much like a teacher would.
“The first step is to get all the terms with x on one side,” the model said in a friendly tone. “So, what do you think we should do with that plus one?”
GPT-4o will store records of users’ interactions with it, meaning the model “now has a sense of continuity across all your conversations,” according to Murati. Other highlights include live translation, the ability to search through your conversations with the model, and the power to look up information in real time.
As is the nature of a live demo, there were hiccups and glitches. GPT-4o’s voice might jump in awkwardly during the conversation. It appeared to comment on one of the presenter’s outfits even though it wasn’t asked to. But it recovered well when the demonstrators told the model it had erred, it seems to be able to respond quickly and helpfully across several mediums that other models have not yet merged as effectively.
Previously, many of OpenAI’s most powerful features, like reasoning through image and video, were behind a paywall. GPT-4o marks the first time they’ll be opened up to the wider public, though it’s not yet clear how many interactions you’ll be able to have with the model before being charged. OpenAI says paying subscribers will “continue to have up to five times the capacity limits of our free users.”
Additional reporting by Will Douglas Heaven.