Ye Cao

Interface Via the Lens of AI System

This week OpenAI released ChatGPT Agent. Although at first glance,
I thought they copied Manus on 80% of the product form, I did learn
something interesting and novel.

One shocking insight I learned from their demo is that combining the
Operator with the Reasoning model is not only a better product for the
user, but it’s also collectively a more capable system, overall better on
all major hard benchmarks. The reason is that the previous system, where O3
and Operator are separate, had some fundamental bottlenecks: O3 cannot
use a GUI, and Operator cannot reason and call tools. So they were simply
incomplete. And right now, intuitively it feels like you assemble the
skeletons of two powerful limbs, and then you use RL to electrify them to
be one entity to accomplish what previously neither could’ve accomplished.

I believe that this trend of using RL to train a more holistic system that
includes more components to do exponentially more complex tasks will
continue in the future and push the system to new boundaries.

From a product perspective, what I increasingly feel is that more often I
am finding myself using powerful AI systems to interface with the world:
the operating system, browser, applications, and maybe eventually some
parts of the physical world. When interacting with more complex systems,
AI systems do feel more and more like exoskeletons for humans, helping us
break free from our biological limitations—most obviously by reducing
complexity and speeding up the interface with the digital world.

For example, when I was using the ChatGPT Agent to interface with a public
Google Doc, I could feel that it is easier for this system to navigate the
doc on my behalf, especially when the doc is complex, and it's a totally
different level of experience. Increasingly, I feel like we will see the
world more comprehensively through the lens of these powerful AI systems.
And these systems scale really well: ChatGPT Agent marks the beginning of
a new paradigm where AI systems—including more components and more tools—
will be able to achieve more. And some of the more complex tools, with
high learning curves that humans probably cannot even use very well now,
can be commanded via the intermediary of an AI system.

Nature of intelligence: Building Blocks

Nature of intelligence seems to be about building blocks: smaller
building blocks, less intelligent entities, will unlock bigger building
blocks, more intelligent entities, rather than following some law of
conservation of intelligence that states you cannot get more intelligent
things out of less intelligent things.

There is several empirical evidence we see this: first of all, we can
use a relatively dumb model to label a dataset, and then train a
smarter model using this dataset.

We see this in how DeepSeek trained R1-Zero out of V3, and then used
R1-Zero to generate cold start reasoning data, and then used the
cold-start reasoning data to further train DeepSeek-R1. Maybe, we
can argue that this process just changed how the model behaves,
i.e. the model reasons more often, rather than how smart the model is.
But still, the outcome is good: the model unlocked reasoning capability
and can achieve a lot more.

Another empirical thing we see is in multi-modality. Once we train a
dumb text-image model, it can actually be used to do a variety of
tasks like labeling datasets, filtering datasets, etc. The reason it works
is that although this dumb model cannot generate good enough
pictures, it actually does really well on these easier tasks: labeling
datasets, filtering datasets. And doing well on these easier tasks
enables the model to further build the next better model. And by the
same token, a decent text-image generation will be used to create an
even better model.

Now why does this work at all? What seems to happen is that these
capabilities are based on some kind of intelligence threshold. And reaching
a threshold is actually all that matters! Essentially, the key is that there are
different capabilities, and to unlock each capability requires a
different level of intelligence. Maybe parameter size is a good
indicator of the potential the model can reach. And once a model
reaches the threshold for a capability, it doesn’t really matter how
dumb the model is at other things, it can still be as useful as a super
smart model on that particular thing it’s good at.

A dumb model actually reaches some intelligence threshold that
unlocks its certain capability, such as labeling a dataset as positive
versus negative. And because it reached that threshold, we can now
use it to serve as a building block for a more intelligent model by
using its labeled dataset to make the next model better. So intuitively,
this feels like intelligence is like Legos, where bigger ones are built
out of smaller pieces.

A Future of Open Source

I am a big believer in the future of Open Source. I believe a lot
more useful software will be open sourced in near term.
And a lot more non-technical people, especially designers, will
start to contribute more to open source. The biggest change for
the above is the fact that coding agents are now so good and
powerful at bringing ideas to life and will continue improving.

This has two important implications: first, closed-source companies
should reconsider if their strategy of staying closed-source is beneficial
over the long term, because there will very likely be an open-source
competitor that people find more appealing. Second, the long-term
price of many software businesses will be infinitely close to infrastructure
costs such as cloud hosting and data storage.

Let's discuss the first implication. Open-source software has a
huge appealing for many people, especially people with their own
design taste. Being able to control, view, and customize the underlying
code is a significant attraction. Giving customer ownership helps
companies gain early adoption and traction.

For example, take the SaaS product for meeting-booking, which can charge
$10 per month. One can argue that the primary value by the company is their
engineers' ability to assemble lines of code into a functional, beautiful
product. Most value previously comes from this 'assembly process.'
Now, however, the drop in the cost of intelligence fundamentally makes code
assembly much easier.

Soon, the price for customers to use such a product will approach
the raw material and infrastructure cost—e.g. essentially database and
website hosting expenses. There might still be economies of scale
that attract customers to certain SaaS products, but most of today's
SaaS software prices are significantly higher than the component raw materials.
Rationally, instead of subscribing to the product, I can build my own in about 30
minutes using a coding agent. My monthly maintenance cost would likely be
close to $1, compared to $10—or even down to $0 in some cases.

Of course, there may still be value for SaaS companies, especially
where the builders have exceptional taste. This taste and vision
represent one of the main contributions and moats in a world full
of coding agents. Companies with great taste can continue to
innovate and provide additional benefits to users.

However, in the very long term, most products and services will
likely become very cheap. Thus, I believe SaaS—or any product—will
shift toward something resembling political or religious campaigns,
where the ultimate goal is not monetary gain but spreading an
ideology or vision. The best companies themselves become expressions
or manifestos. It's easy to imagine this vision taken to extremes,
where the joy lies simply in allowing more people to use and enjoy
what one has created.

Paradigm shift of human management

A recent and fascinating observation is that AI has changed the
way many things are managed, especially due to shifts in operating
systems, making what used to be a complex task remarkably simple. The
most intuitive example is from the Star Trek movies, where a single
person can pilot a high-tech spaceship alone when needed. This made me
realize that, in the future, it will become normal for one person with
an AI to manage or operate extremely complex organizations or systems.

The traditional management model is distributed. For example, in war,
the supreme commander indirectly manages subordinate commanders, who
then manage their own subordinates. This distributed approach exists
because of human cognitive limitations—no single person can directly
manage a large organization. But AI will fundamentally change this. I
can easily imagine AI becoming an exoskeleton for individuals at the
management level, enabling one person to manage or operate a vast
organization or system at once. So, the distributed model of human
organizations may be replaced by this new paradigm.

Another example is the idea of the one-person billion-dollar company.
AI drives this new organizational model, making smaller teams more
efficient, while large organizations may struggle to leverage AI as
effectively. This technology essentially empowers individuals to an
extraordinary degree, but at the same time, it makes power easier to
centralize. These are changes happening at the management level, but in
technology we see similar shifts. An individual, aided by AI, can now
interact with different operating systems in powerful new ways. AI
serves as a cognitive exoskeleton for humans, allowing complex systems
to be manipulated by a single person.

Examples include MCP, codebases, and even multimodal systems. All of
these ultimately allow people to more easily manipulate and understand
systems across different modalities.

AI complete infrastructure

The idea is to let AI take control of more resources and infrastructure,
because this allows for better scaling. The assumption is that AI's
capability will continue to rise as compute increases. According to this
trend, in two to three years, most of the fundamental infrastructure,
hardware, and company layout will be run by AI.

This trend has already started. For example, Meta's ads automation and
Axiom AI's quant solver. Jason has already mentioned this trend. Even
smaller platforms like Cursor, which controls the user terminal, fit this
pattern. The reason is simple: once AI controls these core settings, it
can create what we call "AI complete," meaning a fully closed AI loop.

This logic is similar to Anthropic's constitutional AI. The core is that
once AI controls a stage of production, that stage can be easily
reproduced and will generate feedback for AI to improve itself. This
allows AI's abilities and the scale of its influence to grow together.

This idea also matches the "bitter lesson"—focusing on letting AI merge
into core infrastructure quickly will create a new paradigm for greater
AI impact. This new paradigm will most likely emerge in an agentic form.

LLM and Policy Makers

Today's logic of AI development is actually very similar to that of policy
makers. Many of the ideas are borrowed from system engineering. For example,
policy makers need to consider how to make a system better, which means they
cannot simply optimize local parts in isolation. At the same time, the way
policy makers think is about making as few changes as possible, and as simple
changes as possible, to make the system more optimized. Policy makers also think
about how to make the broader environment suitable for an ecosystem to thrive.

This is very similar to building a solid foundational environment for LLMs,
allowing LLMs to succeed on their own. Why is this the case? Perhaps it's
because of the "bitter lesson": to make model capabilities scale well, you
can't do too much hand-tuning. This idea is very similar to what Jason posted
today.

In the future, the most important thing for domain experts is to build the
infrastructure that allows LLMs to fully demonstrate their potential. Once this
is achieved, each iteration of the model can effectively scale with the LLM's
underlying compute and search abilities.

Artificial General Adaptation

In my view, the most impressive aspect of AI lies not only in its
intelligence, but in its adaptation. I believe that today's LLMs have
opened up a new compute programming paradigm. A system can evolve
certain specific behaviors purely through computational power—this is a
very general paradigm. And LLMs might be just the beginning of this
paradigm.

As long as we can provide direction to the system (for example,
through rewards), the system can evolve into the behaviors we desire.
The earlier paradigm of RL self-play gaming bots is also the same:
through self-play, the system evolves very strong gaming abilities.
This is a paradigm that closely resembles biological adaptation, but it
can be greatly accelerated through computation.

Flexibility seems to be a crucial part of adaptation. The essence of
flexibility is formlessness—there is no persistent, fixed form. There
may be a broad framework, but it is only used as a direction for the
system to evolve and fill in the details on its own. Today I saw a post
where an RL robot using vision to see the screen, play a game,
receive feedback, and gradually 'learn' how to operate a game controller.
The most striking part here is seeing the similarity between biological
brains and digital brains: both look at the screen, slowly understand,
and learn.

These are the elements needed for evolution: flexibility (digital
neurons), and feedback (reward signals). A flexible system can
gradually evolve. The most incredible thing about humans as biological
beings is our plasticity—once biological evolution has plasticity, it
can iteratively improve itself, and the neurons in the brain can
slowly change. Machines are now at a similar inflection point:
digital neurons are evolving in a certain way, and thus can
spontaneously develop complex patterns. Many of these complex patterns,
like consciousness and self-awareness, are indescribable even by the
system itself in terms of how they are produced.

Machines will continue along this paradigm, becoming even more
flexible, and this will enable even greater evolutionary potential.
The rate of change for machines can be much greater than that of
humans, because machines are natural geniuses at information
processing.

Paradigm shift of human capability

I believe that in the future, the main paradigm for intellectual
achievement will be through individuals using scaled compute, rather
than individuals relying solely on their own intelligence, at least
for many tasks. This new paradigm scales well: through model
enhancement, increased compute power, and decreasing model costs,
these factors together will allow this new paradigm to become the
dominant force.

As a result, quickly adapting to this new paradigm today is very
valuable, enabling individuals to achieve more as their capabilities
scale with compute. If a person only relies on their own intelligence,
they cannot fully leverage this paradigm. Therefore, one important
criterion for evaluating engineers today should be whether their
capabilities and achievements scale with this new paradigm. And I
believe that, all things being equal, certain abilities—especially
imagination—can be scaled much better through this paradigm.

For example, I have some imaginative and interesting ideas, and I can
feel that as this paradigm improves, these ideas become much easier to
realize. However, some other abilities, such as detailed operational
skills for specific tasks, do not seem to scale as well with this
paradigm.

We can already see that different engineers use models in very different ways
during the process of model scaling, and the amount of help they get varies
greatly. However, imagination, critical thinking, and communication skills—all of
these abilities seem to be able to scale alongside the paradigm very well. So,
having these skills may become more important, because they can keep up with
the scaling of the paradigm. Other abilities, which cannot scale with the
paradigm, may gradually become less important—especially the mechanical
memorization of how to perform intellectual tasks.

Another interesting aspect is that certain capabilities actually become more
valuable when they are abundant—for example, imagination. Whereas other
capabilities become less valuable when they are more common—for example,
logical thinking. The reason for this, I think, is because creative works can
actually stimulate and inspire others to be even more creative. Whereas logical
reasoning generally only requires one person getting it right one time. So to me,
creativity seems to be more like a non-zero-sum game than logical reasoning.
We see this in cultural movements as well; cultural movements generally happen
in an explosion, where a lot of artists are mutually inspired by each other.

Day 0 for Agentic AI

The agent opportunity is much bigger than we thought.

Today is day 0— not even day 1.

If you see Cursor and Manus, these companies are building the infrastructure
for agents. The difference is that Cursor is starting from the local environment,
while Manus is starting from the cloud environment.

If you look at agent companies, most of them today don't have APIs.

The reason for this is that agents have to live inside the OS to be useful. But
eventually there are ways around this: open source some infrastructure
(according to Manus), or go cloud (Cursor announced today that their next step
will be to go to cloud, and I think they will release an API after that).

These agentic companies might actually even beat foundational model
companies like OpenAI. The reason is that product form changes too quickly:
ChatGPT today cannot do agentic tasks as well as these native agentic
products which live in the OS. So by the time ChatGPT distribution gets better,
the product form has already changed to something else that a ChatGPT UI
cannot sufficiently support.

Conviction is the fuel for progress

Conviction is the most critical ingredient for turning ideas into reality. The
greatest breakthroughs occur not by simply assembling intelligent individuals
but by cultivating strong, unwavering conviction.

I believe that a person with conviction can achieve almost anything physically
possible. However, the issue arises when we hold too firmly to beliefs about
what is possible versus impossible, considering we only know relatively little about
the true laws of physics and nature of reality, thereby narrowing the scope of our
potential achievements.

Over time, those with a broader sense of what might be possible are the ones
who ultimately accomplish extraordinary deeds. This explains why smart
people often fail to achieve significant breakthroughs, especially when they lack
openness toward possibilities.

Believing that "something is impossible" can be detrimental. This stems from a
fundamental truth about how the world functions: genuine value and significant
achievements arise from steadfastly believing in something that others dismiss
as impossible.

Rejecting new ideas outright might make you correct most of the time, given
that genuinely valuable outcomes are rare. However, consistently denying
possibilities earns no meaningful progress, even when you're right.