When OpenAI announced its “12 Days of Shipmas,” excitement and expectations remained high amid expectations of breakthrough technology. However, on Friday, the company literally ended on a high note with the release of the o3 model, but it still ended in a story of unmet expectations.
On day one, OpenAI essentially released the full version of the o1 model that had already been released. They introduced a $200 o1 Pro mode that required more processing power, but it was divisive because it wasn’t for everyone.
The trend continued thereafter. The company repackaged old features with minor updates and offered tools that competitors had been offering for quite some time.
“It looks like they had three major releases, and ChatGPT said, ‘How can we sprinkle this with nine other minor features and product improvements so we can call it 12 days of OpenAI? George Pickett, a San Francisco-based software engineer, echoed the sentiments of many users when commenting on X.
old wine in new bottles
During the 12-day Shipmas, OpenAI released Sora, a text-to-video generative model. But it only fulfilled its original promise and nothing more.
Later, an update was made to ChatGPT Search. Again, nothing noteworthy here. It was free to all users. Another feature, the ChatGPT project, has been available in Claude for quite some time.
Finally, OpenAI announced that this model will be available on WhatsApp. This is the realm of Meta, where AI bots are available for free to all WhatsApp users.
OpenAI said on Day 11 that it only expanded its ability to read content from external apps, with plans to introduce agents in 2025. Still, nothing comes close to Computer Use or Copilot Vision.
But there was also a glimmer of hope. These demos included OpenAI engineers and researchers interacting with OpenAI to perform fun, festive-themed tasks.
For example, many people would object to devoting an entire live stream just to announce that ChatGPT is now available on iPhone.
Demonstrating how to use features down to the basics proves to be extremely important for OpenAI to enable new users to use these features.
But it wasn’t groundbreaking enough to garner much praise, at least not from the AI community.
Two weeks is too long to prepare a meal.
Amidst all the fanfare and festivities, OpenAI’s biggest competitor, Google, was quick to react. For better or worse, they took an approach that goes against OpenAI. The idea was to not hold the two-week event.
On the day it released Project Mariner, Gemini 2.0, and the revamped Project Astra, the company announced the features in a lengthy blog post with a hidden demo video.
But that alone was enough to make a big impact. The conversation began to shift, with experts wondering if Google was “killing” OpenAI. Consider Google’s latest video model, Veo 2.
In internal testing, Veo outperformed rivals such as Kling, Meta’s Moviegen, and OpenAI’s Sora in terms of quality and compliance with prompts.
Google Deepmind researcher Jonas Adler said: However, I’m less impressed with Santa mode as it competes with Gemini 2.0. It just doesn’t have the same gravity. ”
OpenAI used a 20-minute demo to announce its flagship model, but Google CEO Sundar Pichai only needed 180 characters. He simply picked up the X and announced that Gemini Advanced subscribers could try out the Gemini-exp-1206 model. They also released Gemini 2.0 Flash Thinking, an advanced inference model.
Meanwhile, another AI company in the race, Anthropic, could pretend that its AI, particularly Claude 3 Opus, is following the new rules while secretly sticking to the old ones. I discovered that.
This contest showed that you don’t need bells and whistles to announce new developments.
“I can’t believe o3 is AGI.”
On days when OpenAI didn’t have a lot of exciting announcements, it would subtly include references to AGI (artificial general intelligence) in its demo screens.
For example, when OpenAI announced its ChatGPT integration for iPhone, a calendar event titled “Super Secret AGI” was discovered. OpenAI President Greg Brockman posted on X midway through OpenAI’s 12 Days that “Agi is in the air.”
In a way, the company has raised expectations that it will officially announce AGI in 12 days. However, on D-day, when everyone was expecting OpenAI to become a reality, the company announced its o3 series of models.
Indeed, o3 is a monumental feat. Given how much awe some researchers have of the o1 model, it will be a great model once it gets out into the world.
The hype around o3 is out of control.
It’s not AGI, it’s not a singularity. There is no need to change your worldview at all.
In fact, how can you claim any of the above when the public doesn’t even have access to the model?
We appreciate the efforts of OpenAI researchers…
— Elvis (@omarsar0) December 21, 2024
Although OpenAI did not explicitly mention that o3 is “AGI,” the company tested the o3 model using the ARC-AGI benchmark.
Keras creator and former Google researcher François Chollet built the ARC-AGI benchmark and said it is “the only AI benchmark that measures progress toward general intelligence.”
The authors of the benchmark also said, “If the ARC-AGI solution is discovered, it will have a greater impact than the discovery of the transformer. This solution will open up a new field of technology.”
The O3 model scored nearly 90% on the benchmark, outperforming humans. However, Chollet was dissatisfied. Regarding X, he said, “I can’t believe this is AGI. There are still simple ARC-AGI-1 tasks that O3 can’t solve.”
We also revealed that there is evidence that o3 will struggle when tested on the next iteration, the ARC-AGI 2 benchmark.
Besides the debate over whether or not solving ARC-AGI is genuine, O3 doesn’t seem to have exactly solved the more difficult challenges.
More difficult ARC-AGI tests require solving private problems that are not found in any of the datasets exposed to the model.
However, the o3 model’s high scores were achieved on the “semi-private” problem set.
That said, ARC-AGI currently does not allow AI models to be tested on private evaluation sets to prevent data leakage.
For what it’s worth, that worked in OpenAI’s favor. The internet is buzzing about this model. Another benchmark that o3 was tested on is the Frontier Math benchmark. It was previously revealed that the leading model could only solve 2% of the problems, but o3 managed to score 25%.
Results from internal testing also raised concerns. “No one outside of OpenAI has evaluated the robustness of o3 to different types of problems,” says Gary Marcus, a scientist and researcher who is vocal about AI and cognitive psychology. says.
So what’s next?
OpenAI has not yet officially announced AGI, presumably because it could negatively impact Microsoft. Once the company declares AGI, Microsoft will no longer have access to OpenAI’s models.
According to reports, OpenAI is considering removing this clause. As o3 gets closer to fine-tuning and official release, will OpenAI declare AGI?
Despite the last announcement, speculation has already begun on the internet following OpenAI’s disappointing 12 days of results.
Sam Altman, the master of hype, is back at work without a break.
Speaking at the 2024 FinRegLab AI Symposium, Altman said, “By the end of 2025, we expect to see systems that can perform truly amazing cognitive tasks. It’s going to be something that people think is smarter than anything else.” I am facing many difficult problems. ”
AGI-1
— Sam Altman (@sama) December 21, 2024
But he noted that the term “AGI” may no longer be relevant, saying it has become less useful. Still, OpenAI continues to explore the five levels of AI as intended. The ultimate level includes organizations that can perform all functions of a company autonomously without human participation.