Hype or not?
Do the capabilities of today’s LLMs justify their hype?
It’s certainly easy to see that some of the excitement around large models may prove to be a classic market bubble. How many of the startups receiving decabillion-dollar valuations will still be around in ten years? How many of the projects being spun up inside of large companies will actually bear fruit? There is a lot of hype, to be sure, but we’ll make the case that the rapidly improving capabilities of today’s GenAI models are not just substantive, but potentially transformative in their impact on most cognitive and creative work.
It seems like every day there is an article about how AI will destroy life as we know it or put every developer out of a job or achieve sentience within six months. With the emergence of any promising new technology, lots of players will enter the market, and most of them will exit.
Let’s start by pinning down some hype and some substance. Since the release of ChatGPT in November of 2019, commentators have been warning that singularity, the emergence of a superintelligence that will come to dominate humanity, is just months away. We’re still waiting.
But here are some tangible facts: modern LLMs vastly outperform the average human on a variety of tests, including the MMLU, a multiple choice exam that ranges in difficulty from elementary school through high school, all the way up to college and professional exam questions.
Care to guess the average score for a non-expert human on this test? 34.5%.
We could go on about bad predictions and benchmark scores, but we think it’s probably more useful to look at today’s GenAI as a continuation of a trend to track how much it's improved since the release of ChatGPT.
Substantial growth
Everyone is familiar with the robotic assistance you meet on telephone calls or chat with on websites. In the past, these mostly stuck to a script, although there was a brief period around 2015 when large tech companies felt that chat technology was ready for full-blown conversation. The hype, in that case, proved premature. The bots were easily led astray or tricked into emitting toxic content. Expectations did not meet reality when these systems left the demo environment and were deployed in the real world.
While the hype around chatbots was fading, OpenAI began its work on LLMs. GPT-1 was released in 2018, GPT-2 in 2019, and GPT-3 in 2020. Each of the first three releases showed progress and caught the attention of experts in the field, but none caught the attention of the general public or became a widely adopted tool for everyday work.
When ChatGPT (roughly GPT 3.5) was released, however, it quickly became clear that we had reached a tipping point. The answers the chatbot provided, its ability to understand and reason with language, and its capacity to improve by iterating on its own responses made it an order of magnitude more useful than previous iterations. GPT-4, which followed shortly after, showed enormous improvements on a number of benchmarks, from standardized tests to subjective human evaluations.
Let’s reference some concrete examples. AlphaCode, a fine-tuned model built on top of Google’s PALM foundation model, was trained for competitive programming. When it was first released in February of 2022, it scored better than 50% of humans at solving problems it had never encountered before. In December of 2023, Google announced AlphaCode 2, which was built on top of its latest foundation model, Gemini. This system now scores better than 85% of humans on competitive programming exams.
So...is there actually hype?
Of course there is! It seems like every day there is an article about how AI will destroy life as we know it or put every developer out of a job or achieve sentience within six months. With the emergence of any promising new technology, lots of players will enter the market, and most of them will exit. Let’s remind ourselves of the dot-com days, when internet access was a new and expanding technology. We don’t know yet if GenAI will be as transformative as the World Wide Web, but it seems poised to have a significant impact. We want to help you land on the side of Amazon, not Pets.com. Given how central our dataset was to training these systems and our own internal work to build out OverflowAI, we hope to bring you some useful perspective and information—knowledge you can use to position yourself on the right side of this transition.