The focus today should be on leveraging better hardware, thanks
to advancements from Moore's Law, rather than trying to compete
with this trend through software tweaks. We should think about
utilizing the massive computational power we’ll have in the future—
potentially 1000 times more than today—instead of attempting to
squeeze small gains from current hardware.
This approach requires us to rethink how we train models, even if
they may not seem efficient now. Over time, this shift will pay off
significantly. While short-term gains come from optimizing domain-
specific aspects, long-term success relies on broader, more adaptable
capabilities (Chap.28: system engineering). The “bitter lesson” is that
real progress doesn’t come from perfecting small tricks, like making AI
only better at writing Python code, but from developing general skills—
understanding, reasoning, learning, planning, and predicting.
Moore's Law suggests that not only is high-end computational power
improving, but the cost-performance ratio is also decreasing more
dramatically than for lower-end hardware. This is why the iPhone,
despite being expensive, becomes more accessible over time and
outperforms cheaper alternatives on the market. High-end hardware
benefits from exponential growth in performance while becoming
more affordable, whereas low-end hardware sees only incremental
gains and limited cost reduction. This explains why there are few
high-quality consumer electronics priced below $100—performance
improves slowly at this level but accelerates significantly beyond it.