r/artificial • u/PianistWinter8293 • 9m ago
Discussion The stochastic parrot was just a phase, we will now see the 'Lee Sedol moment' for LLMs
The biggest criticism of LLMs is that they are stochastic parrots, not capable of understanding what they say. With Anthropic's research, it has become increasingly evident that this is not the case and that LLMs have real-world understanding. However, with the breadth of knowledge of LLMs, we have yet to experience the 'Lee Sedol moment' in which an LLM performs something so creative and smart that it stuns and even outperforms the smartest human. But there is a very good reason why this hasn't happened yet and why this is soon to change.
Models have previously focussed on pre-training using unsupervised learning. This means that the model is rewarded for predicting the next word, i.e., to copy a text as well as possible. This leads to smart, understanding models but not to creativity. The reward signal is too densely populated on the output (every token needs to be correct), hence, the model has no flexibility in how to create its answer.
Now we have entered the era of post-training with RL: we finally figured out how to use RL on LLM such that their performance increases. This is HUGE. RL is what made the Lee Sedol moment happen. The delayed reward gives room for the model to experiment in, as we see now with reasoning models trying out different chains-of-thought (CoT). Once it finds one that works, we enhance it.
Notice that we don't train the model on human chain-of-thought data; we let it create its chain-of-thought. Although deeply inspired by human CoT from pre-training, the result is still unique and creative. More importantly, it can exceed human capabilities of reasoning! This is not bound by human intelligence like in pre-training, and the capacity for models to exceed human capabilities is limitless. Soon, we will have the 'Lee Sedol moment' for LLMs. After that, it will be a given that AI is a better reasoner than any human on Earth.
The implications will be that any domain heavily bottlenecked by reasoning capabilities will explode in progress, such as mathematics and exact sciences. Another important implication is that the model's real-world understanding will skyrocket since RL on reasoning tasks forces the models to form a very solid conceptual understanding of the world. Just like a student that makes all the exercises and thinks deeply about the subject will have a much deeper understanding than one who doesn't, future LLMs will have an unprecedented world understanding.