I think it's because benchmarks serve as roadmaps. Publishing
benchmarks
signals the direction of future development to the
community and ecosystem.
As long as the direction is correct, the
ecosystem will naturally align itself
accordingly.
Then, the company only needs to excel in the conditions defined by
the
benchmark. If the benchmark itself is set incorrectly, the
direction becomes
flawed. Disruptive companies often change the way
success is measured; they
prioritize differently from mainstream ones.
For example, what truly disrupted AI with generative models?
Traditional AI
emphasized accuracy in classification tasks—essentially
multiple-choice
questions. However, generative AI required AI to
produce much broader
outputs, something nobody initially considered.
This meant that autoregressive models, which didn't seem important
at the
time, became central. To recognize the value of autoregressive
models, one
must first value AI's horizontal use cases or
generalization capabilities.
In essence, different visions lead to different benchmarks, and the
perceived
importance of certain directions often comes down to taste
and overall strategic
perspective.