This week OpenAI released ChatGPT Agent. Although at first glance,
I thought they copied Manus on 80% of the product form, I did learn
something interesting and novel.
One shocking insight I learned from their demo is that combining the
Operator with the Reasoning model is not only a better product for the
user, but it’s also collectively a more capable system, overall better on
all major hard benchmarks. The reason is that the previous system, where O3
and Operator are separate, had some fundamental bottlenecks: O3 cannot
use a GUI, and Operator cannot reason and call tools. So they were simply
incomplete. And right now, intuitively it feels like you assemble the
skeletons of two powerful limbs, and then you use RL to electrify them to
be one entity to accomplish what previously neither could’ve accomplished.
I believe that this trend of using RL to train a more holistic system that
includes more components to do exponentially more complex tasks will
continue in the future and push the system to new boundaries.
From a product perspective, what I increasingly feel is that more often I
am finding myself using powerful AI systems to interface with the world:
the operating system, browser, applications, and maybe eventually some
parts of the physical world. When interacting with more complex systems,
AI systems do feel more and more like exoskeletons for humans, helping us
break free from our biological limitations—most obviously by reducing
complexity and speeding up the interface with the digital world.
For example, when I was using the ChatGPT Agent to interface with a public
Google Doc, I could feel that it is easier for this system to navigate the
doc on my behalf, especially when the doc is complex, and it's a totally
different level of experience. Increasingly, I feel like we will see the
world more comprehensively through the lens of these powerful AI systems.
And these systems scale really well: ChatGPT Agent marks the beginning of
a new paradigm where AI systems—including more components and more tools—
will be able to achieve more. And some of the more complex tools, with
high learning curves that humans probably cannot even use very well now,
can be commanded via the intermediary of an AI system.