Ye Cao

AI Companies by Organizational Structure

Today's AI company structures fall into four broad archetypes. Each archetype
differs in how it mixes research, engineering, product, and distribution, and in
the moats it tries to build. Below is a high‑level map of those categories.

Foundation‑model leaders such as OpenAI, Anthropic, DeepMind, and
DeepSeek are led by founders who treat models like physics systems and
set ambitions decades out. OpenAI uses a capital‑ and startup‑style sprint,
while Anthropic, DeepMind remain research‑led.

They shape the global foundation‑model road‑map; later companies cluster
around their ecosystems. Their org charts braid research, engineering, and
product, with researcher and engineer density forming the deepest moat.

Product‑first players like Perplexity and Cursor placed early, accurate bets
on product form. Their headcount skews toward design, engineering, and
distribution, though they keep talented researchers in‑house, even if model
R&D has yet to pay off.

They compete with category 1 but rely on upstream model upgrades outside
their control, leaving them reactive. Meanwhile category 1 now offers a
$20/month platform bundle, putting Perplexity‑style firms on the defensive in
distribution.

Vertical (or some horizontal) 2B companies—Spur, Extend, and others—
anchor on domain knowledge in healthcare, finance, law, etc. Headcount
tilts toward product and sales; the product may be an LLM wrapper, but
lasting distribution, not wrapping, is the key.

Their moat is founders’ domain insight plus launch, iteration, and distribution
speed. Big‑tech rivalry is light because giants ignore smaller niches. The
mission is to validate demand quickly and ship fast. Researchers are optional,
useful mainly for road‑map foresight.

2B model‑innovation companies are rarer. Their core team is researchers
and engineers who deliver genuine breakthroughs—for example, enterprise
fine‑tuning shops. They build atop open‑source backbones like Llama,
DeepSeek, or soon GPT‑4.1 to serve clients.

The risk is betting on the wrong branch: if an incumbent lifts the bottleneck the
niche disappears. The challenge is packaging tech into products that existing
2B firms can adopt; when those firms already ship, displacement is hard, but a
right bet yields a moat.

In the end distribution is both the hardest and most vital piece. Products,
models, and papers are only means to enable durable distribution. Even a
brilliant product exists to reach users more easily; if distribution stalls, the
company soon falters.

A New Programming Paradigm

Compute is the fuel of system evolution, and data plus algorithm provide the
direction.

Language models might mark the beginning of a new programming paradigm.
Traditional programming tightly couples code and system behavior, allowing us
to directly infer system decisions by examining the code. However, language
models differ greatly.

Instead, we use benchmarks to roughly judge and understand the model's
behavior. The intelligence and actions of these models are largely independent
from their code, emerging naturally during the training process. I believe this
approach to system programming may represent the early stages of a
fundamentally new programming paradigm, where systems evolve
autonomously.

In this paradigm, we provide only high-level signals and metrics to align the
system broadly with human intentions. Even these high-level directions should
ideally be simple, granting the system greater flexibility to evolve independently.
Consider the future of games as an example. Game settings, environments,
physics laws, and even the purpose of the game itself might no longer be
explicitly defined in the game's programming logic, but rather exist in an
evolved state.

Computational power may become the essential resource fueling this evolution.
Today's language models may thus represent just the start of a new programming
approach, with future GPU resources or other methods continuing to drive
system evolution.

Prompt

Today, many prompts are more valuable than code. Prompts remain relatively
stable, while the quality of code generated improves with model enhancements.
There should be something akin to a "git commit history" for prompts, allowing
us to study their iterations and understand which prompts better activate the
model's abilities.

Good prompts can unlock the model's inherent intelligence, enabling the
generation of higher-quality code. I'm curious whether, in the future, as models
become more empathetic, they might reduce the importance of a user's
prompt-crafting skills.

Minecraft

Minecraft has several interesting features. First, the game doesn't
explicitly provide rules or instructions; players must explore and
discover the laws of physics on their own. These physics laws
resemble, but are not identical to, those of our real world.

Examples include gravity and contact forces, which makes Minecraft
very similar to reality. Second, the game provides a series of
building blocks that players must assemble themselves. The entire
gameplay is essentially a scaling-up process.

This aspect mirrors nature and reality closely, as both provide
foundational elements that can scale effectively. Recently, I've
had some reflections on machine learning, particularly regarding
the emergent intelligence of large language models (LLMs).

It is truly remarkable that the intelligence of LLMs evolved
naturally. During their self-training process, these models
spontaneously developed self-checking, error-correction,
optimization, and human-like communication and empathetic abilities.

These naturally evolved behaviors indicate that the best scaling
happens when laying a solid foundation and then allowing things to
evolve naturally along a generally intended direction. This scaling
principle appears to be adopted by both nature and reality.

Why AI Companies Love Releasing Benchmarks First

I think it's because benchmarks serve as roadmaps. Publishing benchmarks
signals the direction of future development to the community and ecosystem.
As long as the direction is correct, the ecosystem will naturally align itself
accordingly.

Then, the company only needs to excel in the conditions defined by the
benchmark. If the benchmark itself is set incorrectly, the direction becomes
flawed. Disruptive companies often change the way success is measured; they
prioritize differently from mainstream ones.

For example, what truly disrupted AI with generative models? Traditional AI
emphasized accuracy in classification tasks—essentially multiple-choice
questions. However, generative AI required AI to produce much broader
outputs, something nobody initially considered.

This meant that autoregressive models, which didn't seem important at the
time, became central. To recognize the value of autoregressive models, one
must first value AI's horizontal use cases or generalization capabilities.

In essence, different visions lead to different benchmarks, and the perceived
importance of certain directions often comes down to taste and overall strategic
perspective.

Trust fate and also algorithm (相信缘分更要相信算法)

In today's world, algorithms truly connect people in beautiful ways. What
algorithms actually do is solve the bandwidth problem for human-to-human
communication. We are very inefficient at communication, so it takes us lots of
time to learn about others.

Face-to-face conversations are time-consuming, but algorithms allow us to
build an information highway among people. They do this by interacting with
humans through certain interfaces, learning, and creating digital
representations of users.

These digital representations can travel and communicate with other digital
representations at the speed of light. This is why recommendation algorithms
are amazingly efficient at either broadcasting our messages or showing us
information we'd miss otherwise.

I believe we will trust our future with algorithms even more. Today's algorithms
are still far from being perfect.

Interface of intelligent application

One thing I found about intelligent LLM applications is that Intelligence primarily
resides not in the application interface itself, but within the model. The role of the
application interface is to change the way humans and intelligence interface.

Currently, it's more important for interface design to be native than being smart.
By native, I mean the intelligence should be closer to the operating system, ideally

native inside the operating system.

The design of interface with powerful AI should be the primary concern at the
application level. What makes Cursor successful is that the interface utilizes
intelligent agent to reduce human's barrier to understand and write code. Essentially,
this is achieved through the innovation of interface. For example, Cursor, being an
app rather than a browser-based tool, can directly manipulate the underlying OS.

The fact that interface being native allows intelligent model to better operate
the underlying operating system. Whereas Smartness of interface is less important
because artificially set smartness in the interface could hinder the model's inherent
capabilities. This is also aligned with 'Bitter lesson' that long term only most native UI
would scale well with increasingly capable intelligence.

Vision and Technology

Technology allows seeing the future. The difference between good companies
and great ones is that one knows where to double, triple, and quadruple the bet,
whereas a good company only sort of bets everywhere and with hesitation.

The reason the great company can do this is because of its unique vision and strong
conviction on a certain aspect of technology. One example is that MoE, routing experts,
and sharing experts were long discovered, but few companies in the industry realized
its value until Deepseek tripled its bet on it.

The greatest moat for a technology company is the unique vision of the direction of future
technology. It's very hard for other copycat companies to lead without having the right
vision. OpenAI took down Google because it saw the greatness of transformer architecture,
GPT-1, among many other great technology changes.

These changes allowed OpenAI to see a completely distinct vision for the future, while
Google focused on search. Having the right vision for technology is the single greatest
force of innovation for a tech company.

From a business perspective, technology causes changes in a market almost all
the time. Every day, there's some subtle shift in technology that would impact the future of
a business. If one can see these trends consistently leading to a different future, then one
has the chance to capture the opportunity before others who don't understand the
technology. Vision will be the most important differentiator in the future of business.

More Talent Outside the Wall

No matter how great the company, it has to be the case that there is always
more talent outside the company than inside the company. The only way to stay
competitive is to be able to attract and retain more talent into the company.

The only way to attract great talent is by helping as many people as possible
and being as inclusive in culture as possible.

Importance of Interface (交互)

Recently, I realized that text-content products are very ineffective in content
recommendation compared to video-content products like TikTok. I have been
thinking about how to improve text-content products in general.

The ecosystem of text-content is viral with lots of creators and consumers.
However, somehow, I feel like the text-content product is not well done so far.

One thing I found is that text-content products' recommendation engines are
very ineffective compared to video-content products'. One important reason is
because users read a short thumbnail of titles before deciding they want to go
inside and read.

But this boils down the entire content down to a few bytes of characters, which
oftentimes is too low bandwidth to be effective. So the entire success versus
failure of textual content rests on if the title is appealing to users.

In contrast, video-content products like TikTok are much more effective; the
first five seconds of a video provide very high bandwidth, allowing users to
quickly and effectively decide if they want to continue watching the whole thing.

It turns out that how the interface is designed determines if a recommendation
engine would work or not. For text-content products, because users can only
decide to read based on such limited bandwidth, oftentimes recommendation
engines just makes a small amount of articles go viral .

As a result, most articles are not explored at all, preventing the recommendation
engine from effectively discovering whether majority of the content is good or bad.