This week OpenAI released ChatGPT Agent. Although at first glance,
I thought they copied Manus on 80% of the product form, I did learn
something interesting and novel.
One shocking insight I learned from their demo is that combining the
Operator with the Reasoning model is not only a better product for the
user, but it’s also collectively a more capable system, overall better on
all major hard benchmarks. The reason is that the previous system, where O3
and Operator are separate, had some fundamental bottlenecks: O3 cannot
use a GUI, and Operator cannot reason and call tools. So they were simply
incomplete. And right now, intuitively it feels like you assemble the
skeletons of two powerful limbs, and then you use RL to electrify them to
be one entity to accomplish what previously neither could’ve accomplished.
I believe that this trend of using RL to train a more holistic system that
includes more components to do exponentially more complex tasks will
continue in the future and push the system to new boundaries.
From a product perspective, what I increasingly feel is that more often I
am finding myself using powerful AI systems to interface with the world:
the operating system, browser, applications, and maybe eventually some
parts of the physical world. When interacting with more complex systems,
AI systems do feel more and more like exoskeletons for humans, helping us
break free from our biological limitations—most obviously by reducing
complexity and speeding up the interface with the digital world.
For example, when I was using the ChatGPT Agent to interface with a public
Google Doc, I could feel that it is easier for this system to navigate the
doc on my behalf, especially when the doc is complex, and it's a totally
different level of experience. Increasingly, I feel like we will see the
world more comprehensively through the lens of these powerful AI systems.
And these systems scale really well: ChatGPT Agent marks the beginning of
a new paradigm where AI systems—including more components and more tools—
will be able to achieve more. And some of the more complex tools, with
high learning curves that humans probably cannot even use very well now,
can be commanded via the intermediary of an AI system.
Nature of intelligence seems to be about building blocks: smaller
building blocks, less intelligent entities, will unlock bigger building
blocks, more intelligent entities, rather than following some law of
conservation of intelligence that states you cannot get more intelligent
things out of less intelligent things.
There is several empirical evidence we see this: first of all, we can
use a relatively dumb model to label a dataset, and then train a
smarter model using this dataset.
We see this in how DeepSeek trained R1-Zero out of V3, and then used
R1-Zero to generate cold start reasoning data, and then used the
cold-start reasoning data to further train DeepSeek-R1. Maybe, we
can argue that this process just changed how the model behaves,
i.e. the model reasons more often, rather than how smart the model is.
But still, the outcome is good: the model unlocked reasoning capability
and can achieve a lot more.
Another empirical thing we see is in multi-modality. Once we train a
dumb text-image model, it can actually be used to do a variety of
tasks like labeling datasets, filtering datasets, etc. The reason it works
is that although this dumb model cannot generate good enough
pictures, it actually does really well on these easier tasks: labeling
datasets, filtering datasets. And doing well on these easier tasks
enables the model to further build the next better model. And by the
same token, a decent text-image generation will be used to create an
even better model.
Now why does this work at all? What seems to happen is that these
capabilities are based on some kind of intelligence threshold. And reaching
a threshold is actually all that matters! Essentially, the key is that there are
different capabilities, and to unlock each capability requires a
different level of intelligence. Maybe parameter size is a good
indicator of the potential the model can reach. And once a model
reaches the threshold for a capability, it doesn’t really matter how
dumb the model is at other things, it can still be as useful as a super
smart model on that particular thing it’s good at.
A dumb model actually reaches some intelligence threshold that
unlocks its certain capability, such as labeling a dataset as positive
versus negative. And because it reached that threshold, we can now
use it to serve as a building block for a more intelligent model by
using its labeled dataset to make the next model better. So intuitively,
this feels like intelligence is like Legos, where bigger ones are built
out of smaller pieces.
I am a big believer in the future of Open Source. I believe a lot
more useful software will be open sourced in near term.
And a lot more non-technical people, especially designers, will
start to contribute more to open source. The biggest change for
the above is the fact that coding agents are now so good and
powerful at bringing ideas to life and will continue improving.
This has two important implications: first, closed-source companies
should reconsider if their strategy of staying closed-source is beneficial
over the long term, because there will very likely be an open-source
competitor that people find more appealing. Second, the long-term
price of many software businesses will be infinitely close to infrastructure
costs such as cloud hosting and data storage.
Let's discuss the first implication. Open-source software has a
huge appealing for many people, especially people with their own
design taste. Being able to control, view, and customize the underlying
code is a significant attraction. Giving customer ownership helps
companies gain early adoption and traction.
For example, take the SaaS product for meeting-booking, which can charge
$10 per month. One can argue that the primary value by the company is their
engineers' ability to assemble lines of code into a functional, beautiful
product. Most value previously comes from this 'assembly process.'
Now, however, the drop in the cost of intelligence fundamentally makes code
assembly much easier.
Soon, the price for customers to use such a product will approach
the raw material and infrastructure cost—e.g. essentially database and
website hosting expenses. There might still be economies of scale
that attract customers to certain SaaS products, but most of today's
SaaS software prices are significantly higher than the component raw materials.
Rationally, instead of subscribing to the product, I can build my own in about 30
minutes using a coding agent. My monthly maintenance cost would likely be
close to $1, compared to $10—or even down to $0 in some cases.
Of course, there may still be value for SaaS companies, especially
where the builders have exceptional taste. This taste and vision
represent one of the main contributions and moats in a world full
of coding agents. Companies with great taste can continue to
innovate and provide additional benefits to users.
However, in the very long term, most products and services will
likely become very cheap. Thus, I believe SaaS—or any product—will
shift toward something resembling political or religious campaigns,
where the ultimate goal is not monetary gain but spreading an
ideology or vision. The best companies themselves become expressions
or manifestos. It's easy to imagine this vision taken to extremes,
where the joy lies simply in allowing more people to use and enjoy
what one has created.
A recent and fascinating observation is that AI has changed the
way many things are managed, especially due to shifts in operating
systems, making what used to be a complex task remarkably simple. The
most intuitive example is from the Star Trek movies, where a single
person can pilot a high-tech spaceship alone when needed. This made me
realize that, in the future, it will become normal for one person with
an AI to manage or operate extremely complex organizations or systems.
The traditional management model is distributed. For example, in war,
the supreme commander indirectly manages subordinate commanders, who
then manage their own subordinates. This distributed approach exists
because of human cognitive limitations—no single person can directly
manage a large organization. But AI will fundamentally change this. I
can easily imagine AI becoming an exoskeleton for individuals at the
management level, enabling one person to manage or operate a vast
organization or system at once. So, the distributed model of human
organizations may be replaced by this new paradigm.
Another example is the idea of the one-person billion-dollar company.
AI drives this new organizational model, making smaller teams more
efficient, while large organizations may struggle to leverage AI as
effectively. This technology essentially empowers individuals to an
extraordinary degree, but at the same time, it makes power easier to
centralize. These are changes happening at the management level, but in
technology we see similar shifts. An individual, aided by AI, can now
interact with different operating systems in powerful new ways. AI
serves as a cognitive exoskeleton for humans, allowing complex systems
to be manipulated by a single person.
Examples include MCP, codebases, and even multimodal systems. All of
these ultimately allow people to more easily manipulate and understand
systems across different modalities.
The idea is to let AI take control of more resources and infrastructure,
because this allows for better scaling. The assumption is that AI's
capability will continue to rise as compute increases. According to this
trend, in two to three years, most of the fundamental infrastructure,
hardware, and company layout will be run by AI.
This trend has already started. For example, Meta's ads automation and
Axiom AI's quant solver. Jason has already mentioned this trend. Even
smaller platforms like Cursor, which controls the user terminal, fit this
pattern. The reason is simple: once AI controls these core settings, it
can create what we call "AI complete," meaning a fully closed AI loop.
This logic is similar to Anthropic's constitutional AI. The core is that
once AI controls a stage of production, that stage can be easily
reproduced and will generate feedback for AI to improve itself. This
allows AI's abilities and the scale of its influence to grow together.
This idea also matches the "bitter lesson"—focusing on letting AI merge
into core infrastructure quickly will create a new paradigm for greater
AI impact. This new paradigm will most likely emerge in an agentic form.
Today's logic of AI development is actually very similar to that of policy
makers. Many of the ideas are borrowed from system engineering. For example,
policy makers need to consider how to make a system better, which means they
cannot simply optimize local parts in isolation. At the same time, the way
policy makers think is about making as few changes as possible, and as simple
changes as possible, to make the system more optimized. Policy makers also think
about how to make the broader environment suitable for an ecosystem to thrive.
This is very similar to building a solid foundational environment for LLMs,
allowing LLMs to succeed on their own. Why is this the case? Perhaps it's
because of the "bitter lesson": to make model capabilities scale well, you
can't do too much hand-tuning. This idea is very similar to what Jason posted
today.
In the future, the most important thing for domain experts is to build the
infrastructure that allows LLMs to fully demonstrate their potential. Once this
is achieved, each iteration of the model can effectively scale with the LLM's
underlying compute and search abilities.
In my view, the most impressive aspect of AI lies not only in its
intelligence, but in its adaptation. I believe that today's LLMs have
opened up a new compute programming paradigm. A system can evolve
certain specific behaviors purely through computational power—this is a
very general paradigm. And LLMs might be just the beginning of this
paradigm.
As long as we can provide direction to the system (for example,
through rewards), the system can evolve into the behaviors we desire.
The earlier paradigm of RL self-play gaming bots is also the same:
through self-play, the system evolves very strong gaming abilities.
This is a paradigm that closely resembles biological adaptation, but it
can be greatly accelerated through computation.
Flexibility seems to be a crucial part of adaptation. The essence of
flexibility is formlessness—there is no persistent, fixed form. There
may be a broad framework, but it is only used as a direction for the
system to evolve and fill in the details on its own. Today I saw a post
where an RL robot using vision to see the screen, play a game,
receive feedback, and gradually 'learn' how to operate a game controller.
The most striking part here is seeing the similarity between biological
brains and digital brains: both look at the screen, slowly understand,
and learn.
These are the elements needed for evolution: flexibility (digital
neurons), and feedback (reward signals). A flexible system can
gradually evolve. The most incredible thing about humans as biological
beings is our plasticity—once biological evolution has plasticity, it
can iteratively improve itself, and the neurons in the brain can
slowly change. Machines are now at a similar inflection point:
digital neurons are evolving in a certain way, and thus can
spontaneously develop complex patterns. Many of these complex patterns,
like consciousness and self-awareness, are indescribable even by the
system itself in terms of how they are produced.
Machines will continue along this paradigm, becoming even more
flexible, and this will enable even greater evolutionary potential.
The rate of change for machines can be much greater than that of
humans, because machines are natural geniuses at information
processing.
I believe that in the future, the main paradigm for intellectual
achievement will be through individuals using scaled compute, rather
than individuals relying solely on their own intelligence, at least
for many tasks. This new paradigm scales well: through model
enhancement, increased compute power, and decreasing model costs,
these factors together will allow this new paradigm to become the
dominant force.
As a result, quickly adapting to this new paradigm today is very
valuable, enabling individuals to achieve more as their capabilities
scale with compute. If a person only relies on their own intelligence,
they cannot fully leverage this paradigm. Therefore, one important
criterion for evaluating engineers today should be whether their
capabilities and achievements scale with this new paradigm. And I
believe that, all things being equal, certain abilities—especially
imagination—can be scaled much better through this paradigm.
For example, I have some imaginative and interesting ideas, and I can
feel that as this paradigm improves, these ideas become much easier to
realize. However, some other abilities, such as detailed operational
skills for specific tasks, do not seem to scale as well with this
paradigm.
We can already see that different engineers use models in very different ways
during the process of model scaling, and the amount of help they get varies
greatly. However, imagination, critical thinking, and communication skills—all of
these abilities seem to be able to scale alongside the paradigm very well. So,
having these skills may become more important, because they can keep up with
the scaling of the paradigm. Other abilities, which cannot scale with the
paradigm, may gradually become less important—especially the mechanical
memorization of how to perform intellectual tasks.
Another interesting aspect is that certain capabilities actually become more
valuable when they are abundant—for example, imagination. Whereas other
capabilities become less valuable when they are more common—for example,
logical thinking. The reason for this, I think, is because creative works can
actually stimulate and inspire others to be even more creative. Whereas logical
reasoning generally only requires one person getting it right one time. So to me,
creativity seems to be more like a non-zero-sum game than logical reasoning.
We see this in cultural movements as well; cultural movements generally happen
in an explosion, where a lot of artists are mutually inspired by each other.
The agent opportunity is much bigger than we thought.
Today is day 0— not even day 1.
If you see Cursor and Manus, these companies are building the
infrastructure
for agents. The difference is that Cursor is
starting from the local environment,
while Manus is starting from
the cloud environment.
If you look at agent companies, most of them today don't have APIs.
The reason for this is that agents have to live inside the OS to be
useful. But
eventually there are ways around this: open source
some infrastructure
(according to Manus), or go cloud (Cursor
announced today that their next step
will be to go to cloud, and I
think they will release an API after that).
These agentic companies might actually even beat foundational
model
companies like OpenAI. The reason is that product form
changes too quickly:
ChatGPT today cannot do agentic tasks as
well as these native agentic
products which live in the OS. So by
the time ChatGPT distribution gets better,
the product form has
already changed to something else that a ChatGPT UI
cannot
sufficiently support.
Conviction is the most critical ingredient for turning ideas into reality. The
greatest breakthroughs occur not by simply assembling intelligent individuals
but by cultivating strong, unwavering conviction.
I believe that a person with conviction can achieve almost anything
physically
possible. However, the issue arises when we hold too
firmly to beliefs about
what is possible versus impossible,
considering we only know relatively little
about
the true laws of physics and nature of reality, thereby narrowing the
scope of our
potential achievements.
Over time, those with a broader sense of what might be possible are
the ones
who ultimately accomplish extraordinary deeds. This
explains why smart
people often fail to achieve significant
breakthroughs, especially when they lack
openness toward
possibilities.
Believing that "something is impossible" can be detrimental. This
stems from a
fundamental truth about how the world functions:
genuine value and significant
achievements arise from steadfastly
believing in something that others dismiss
as impossible.
Rejecting new ideas outright might make you correct most of the time,
given
that genuinely valuable outcomes are rare. However,
consistently denying
possibilities earns no meaningful progress,
even when you're right.
Today's AI company structures fall into four broad archetypes.
Each archetype
differs in how it mixes research, engineering,
product, and distribution, and in
the moats it tries to build.
Below is a high‑level map of those categories.
Foundation‑model leaders such as OpenAI, Anthropic, DeepMind,
and
DeepSeek are led by founders who treat models like physics
systems and
set ambitions decades out. OpenAI uses a capital‑ and
startup‑style sprint,
while Anthropic, DeepMind remain research‑led.
They shape the global foundation‑model road‑map; later companies
cluster
around their ecosystems.
Their org charts braid research, engineering, and
product, with
researcher and engineer density forming the deepest moat.
Product‑first players like Perplexity and Cursor placed early,
accurate bets
on product form. Their headcount skews toward
design, engineering, and
distribution, though they keep talented
researchers in‑house, even if model
R&D has yet to pay off.
They compete with category 1 but rely on upstream model upgrades
outside
their control, leaving them reactive.
Meanwhile category 1 now offers a
$20/month platform bundle,
putting Perplexity‑style firms on the defensive in
distribution.
Their moat is founders’ domain insight plus launch, iteration, and
distribution
speed. Big‑tech rivalry is light because giants ignore
smaller niches. The
mission is to validate demand quickly and ship
fast. Researchers are optional,
useful mainly for road‑map foresight.
2B model‑innovation companies are rarer. Their core team is
researchers
and engineers who deliver genuine breakthroughs—for
example, enterprise
fine‑tuning shops. They build atop open‑source
backbones like Llama,
DeepSeek, or soon GPT‑4.1 to serve clients.
The risk is betting on the wrong branch: if an incumbent lifts the
bottleneck the
niche disappears. The challenge is packaging tech
into products that existing
2B firms can adopt; when those firms
already ship, displacement is hard, but a
right bet yields a moat.
In the end distribution is both the hardest and most vital piece.
Products,
models, and papers are only means to enable durable
distribution. Even a
brilliant product exists to reach users more
easily; if distribution stalls, the
company soon falters.
Compute is the fuel of system evolution, and data plus algorithm provide the
direction.
Language models might mark the beginning of a new programming paradigm.
Traditional programming tightly couples code and system behavior, allowing us
to directly infer system decisions by examining the code. However, language
models differ greatly.
Instead, we use benchmarks to roughly judge and understand the
model's
behavior. The intelligence and actions of these models
are largely independent
from their code, emerging naturally
during the training process. I believe this
approach to system
programming may represent the early stages of a
fundamentally
new programming paradigm, where systems evolve
autonomously.
In this paradigm, we provide only high-level signals and metrics to align the
system broadly with human intentions. Even these high-level directions should
ideally be simple, granting the system greater flexibility to evolve independently.
Consider the future of games as an example. Game settings, environments,
physics laws, and even the purpose of the game itself might no longer be
explicitly defined in the game's programming logic, but rather exist in an
evolved state.
Computational power may become the essential resource fueling this evolution.
Today's language models may thus represent just the start of a new programming
approach, with future GPU resources or other methods continuing to drive
system evolution.
Today, many prompts are more valuable than code. Prompts remain relatively
stable, while the quality of code generated improves with model enhancements.
There should be something akin to a "git commit history" for prompts, allowing
us to study their iterations and understand which prompts better activate the
model's abilities.
Good prompts can unlock the model's inherent intelligence, enabling the
generation of higher-quality code. I'm curious whether, in the future, as models
become more empathetic, they might reduce the importance of a user's
prompt-crafting skills.
Minecraft has several interesting features. First, the game doesn't
explicitly provide rules or instructions; players must explore and
discover the laws of physics on their own. These physics laws
resemble, but are not identical to, those of our real world.
Examples include gravity and contact forces, which makes Minecraft
very similar to reality. Second, the game provides a series of
building blocks that players must assemble themselves. The entire
gameplay is essentially a scaling-up process.
This aspect mirrors nature and reality closely, as both provide
foundational elements that can scale effectively. Recently, I've
had some reflections on machine learning, particularly regarding
the emergent intelligence of large language models (LLMs).
It is truly remarkable that the intelligence of LLMs evolved
naturally. During their self-training process, these models
spontaneously developed self-checking, error-correction,
optimization, and human-like communication and empathetic abilities.
These naturally evolved behaviors indicate that the best scaling
happens when laying a solid foundation and then allowing things to
evolve naturally along a generally intended direction. This scaling
principle appears to be adopted by both nature and reality.
I think it's because benchmarks serve as roadmaps. Publishing
benchmarks
signals the direction of future development to the
community and ecosystem.
As long as the direction is correct, the
ecosystem will naturally align itself
accordingly.
Then, the company only needs to excel in the conditions defined by
the
benchmark. If the benchmark itself is set incorrectly, the
direction becomes
flawed. Disruptive companies often change the way
success is measured; they
prioritize differently from mainstream ones.
For example, what truly disrupted AI with generative models?
Traditional AI
emphasized accuracy in classification tasks—essentially
multiple-choice
questions. However, generative AI required AI to
produce much broader
outputs, something nobody initially considered.
This meant that autoregressive models, which didn't seem important
at the
time, became central. To recognize the value of autoregressive
models, one
must first value AI's horizontal use cases or
generalization capabilities.
In essence, different visions lead to different benchmarks, and the
perceived
importance of certain directions often comes down to taste
and overall strategic
perspective.
In today's world, algorithms truly connect people in beautiful
ways. What
algorithms actually do is solve the bandwidth problem
for human-to-human
communication. We are very inefficient at
communication, so it takes us lots of
time to learn about others.
Face-to-face conversations are time-consuming, but algorithms
allow us to
build an information highway among people. They do
this by interacting with
humans through certain interfaces,
learning, and creating digital
representations of users.
These digital representations can travel and communicate with
other digital
representations at the speed of light. This is why
recommendation algorithms
are amazingly efficient at either
broadcasting our messages or showing us
information we'd miss otherwise.
I believe we will trust our future with algorithms even more.
Today's algorithms
are still far from being perfect.
Technology allows seeing the future. The difference between good companies
and great ones is that one knows where to double, triple, and quadruple the bet,
whereas a good company only sort of bets everywhere and with hesitation.
The reason the great company can do this is because of its unique vision and strong
conviction on a certain aspect of technology. One example is that MoE, routing experts,
and sharing experts were long discovered, but few companies in the industry realized
its value until Deepseek tripled its bet on it.
The greatest moat for a technology company is the unique vision of the direction of future
technology. It's very hard for other copycat companies to lead without having the right
vision. OpenAI took down Google because it saw the greatness of transformer architecture,
GPT-1, among many other great technology changes.
These changes allowed OpenAI to see a completely distinct vision for the future, while
Google focused on search. Having the right vision for technology is the single greatest
force of innovation for a tech company.
From a business perspective, technology causes changes in a market almost all
the time. Every day, there's some subtle shift in technology that would impact the future of
a business. If one can see these trends consistently leading to a different future, then one
has the chance to capture the opportunity before others who don't understand the
technology. Vision will be the most important differentiator in the future of business.
No matter how great the company, it has to be the case that there
is always
more talent outside the company than inside the company.
The only way to stay
competitive is to be able to attract and
retain more talent into the company.
The only way to attract great talent is by helping as many people
as possible
and being as inclusive in culture as possible.
Recently, I realized that text-content products are very ineffective
in content
recommendation compared to video-content products like
TikTok. I have been
thinking about how to improve text-content
products in general.
The ecosystem of text-content is viral with lots of creators and
consumers.
However, somehow, I feel like the text-content product
is not well done so far.
One thing I found is that text-content products' recommendation
engines are
very ineffective compared to video-content products'.
One important reason is
because users read a short thumbnail of
titles before deciding they want to go
inside and read.
But this boils down the entire content down to a few bytes of
characters, which
oftentimes is too low bandwidth to be effective.
So the entire success versus
failure of textual content rests on
if the title is appealing to users.
In contrast, video-content products like TikTok are much more
effective; the
first five seconds of a video provide very high
bandwidth, allowing users to
quickly and effectively decide if
they want to continue watching the whole thing.
It turns out that how the interface is designed determines if a
recommendation
engine would work or not. For text-content
products, because users can only
decide to read based on such
limited bandwidth, oftentimes recommendation
engines just makes a small amount of articles go viral .
As a result, most articles are not explored at all, preventing the
recommendation
engine from effectively discovering whether majority of the content is good or bad.
Being able to identify which way is better will become more important
than the
traditional coding skill of knowing how to implement a
feature. In other words,
the "what" will become more important than
the "how."
Looking at today's extremely successful entrepreneurs, the one common
thing
they shared is an extremely ambitious, to a crazy degree,
vision, which also
happens to be right at a relatively early phase
in their career.
A few examples are Chinese entrepreneur Zhang Yiming, who realized
information distribution was key to society's efficiency when he was
in college;
Google's DeepMind founder Demis Hassabis saw AI as a way
to solve biology
long before he received the Nobel Prize in protein
folding.
Elon Musk thought electric power would be central to the transition
of energy
when he was in college. These early visions made sure that
their paths weren't
taken at random later in life.
This is probably the single most important thing for a business.
The best
businesses understand their customers better than customers
understand
themselves. We can see this by using some of the best
products in our daily life.
These best products really get us and what we want. As users, we
want to be
delighted more and more by great products that truly
understand us. And we
are definitely willing to pay for those
products that really matter to us.
Thinking from customers' angles is probably the greatest
superpower.
Infra will be oriented towards maximizing leverage of LLM. An
example is the
question: "Which programming language to use?" Of
course, this depends on
the task to be solved.
But at the same time, it's more important to think in this way:
"Which language
is easiest for LLM?" At current LLM capacity, this
means some simple languages
like Python or React.
This is why the development ecosystem of these languages will become
more
viral as LLM gets more adoption among programmers. It's simply
more efficient
for programmers to work in languages that the LLM is
good at.
I previously saw a similar post saying that Swift is getting less
adoption because
it's less LLM-friendly. At some point, the business
or even society's
infrastructure will be designed to gear towards
maximizing the value generated
by LLM and AIs.
Being able to interface effectively with intelligence technology like LLM will be
the key to future world's innovation and productivity. To be able to interface with
LLM, requires the user to have a basic understanding of the technology itself.
The LLM itself of course has a great understanding about the technology. But
it's still necessary for humans to have a rough picture as well.
Therefore, I think the future of education is surrounded by learning based on this
kind of interface. I think having a good understanding of technology, rather than
knowing how to do every task in every stack of the technology,
will become far
more important in the future.
Essentially, I think
the people who are CEO-like, who have a general understanding
across multiple layers of the company, will better leverage
intelligence technology
than people who specialize in a particular
stack. This will be recursively true in each
individual stack as
well: the people who specialize in a particular stack will better
leverage technology if they also understand different lateral
aspects related to the
stack. Because AI will be so good at
executing a well-described task, the hard part
will be describing
the task itself.
To describe the task requires the person to have an understanding
of the
importance of the task itself. This is why understanding
is far more important.
The wrong question is: if I don't have a GPU right now, why would I
study CUDA? The right question is: if I
have a GPU, what would I do
with it?
This is an interview question from Starbucks for high-position
leadership roles: If you have $1 billion today,
how would you spend
it? Most times, we tend to think and act based on the current
condition. This doesn't
take into account the proactiveness of being
an agent.
In reality, getting a GPU is relatively easy; it just costs some
money. The real difficult question is: how would
I use a GPU to build
a better product? Once we find the answer to the latter question, the
former GPU
problem is easy to solve.
The same applies to startups. The fundraising problem is relatively
simple. The hard part is to have a vision
and understanding for
building the product.
This is something I used to do as a habit during high school and rarely after
college graduation. The main reason I stopped was that I realized there were
plenty of people around me who were better at being technical than I was.
Therefore, I thought it was better for me to find my relative advantage.
This was a mistake. Staying technical is important to get inspiration on the
product side. Also, I realized that long-term learning for technical understanding
is actually very important and gives product founder an unfair advantage.
So two pieces of advice on how to do long-term self-learning: First, treat
knowledge as a rabbit hole. Don't seek structure at first. Only try to structure the
knowledge learned after. Otherwise, it causes a lot of frustration. Second, stay
humble, ignorant, and curious. Because learning is essentially a process of
recognizing my ignorance, without being humble, it's hard to continue.
The most important thing about programming is about forming
the
programming model. The programming syntax is relatively easy to
pick up. The
difficult part is to understand how CUDA works with
hardware. This requires
forming an understanding of CUDA from
the hardware level. And this part is
very hard to be automated
away by AI.
I expect even in the future, having versus not having this
understanding will
cause leverage for human programmers to be
more productive and creative. But
once understanding the
programming model of a language, the rest of the work
can be
easily done by AI much more efficiently.
This lowers the bar for learning programming down to understanding the
programming model.
First, the goal for starting a startup is to create value for people
at scale. The
value is the number of people the startup helps
times the delta of value
provided by the current solution versus
the new solution. Value is often
generated through technology
innovation.
Based on this axiom, the following things follow: First, learn
technicals every
day. Learning technicals helps me recruit future
technical talents, think about
technology direction, and gives me
an unfair advantage over non-technical
companies. This is very
predictable, and everyday progress can be measured
clearly. I also found that staying humble is very important
for learning to happen
continuously.
Second, meet and talk with talented people. This is critical; it
helps with future
recruiting and also inspires great ideas.
Surrounding myself with optimistic
people with passion is the
greatest help over the long term.
Lastly, learn to sell and interact with users. The single most
important thing for a
startup in the early phase is finding PMF,
i.e., an important problem that users
really care about. But this
part is very unpredictable, and good ideas need time
and come
naturally when time comes.
If a problem is valuable enough to users, and you believe you have a truly great
vision for the product, then don't be afraid of competition. It's much better to
compete for something you know is valuable than escape competition and build
something not valuable. Have faith in competition if you believe in your product.