AI answers aren’t knowledge

For developers, LLMs can answer questions, but they can't create new knowledge.

GenAI models are overflowing with answers to our questions, but if you’ve ever asked an LLM questions about a subject you’re very familiar with, you’ll soon notice an important distinction between answers (as in, AI-generated output in response to a question) and knowledge.

By knowledge, we mean a deep, nuanced understanding of a topic based on extensive experience, community-validated solutions and best practices, and contextual awareness. LLMs can produce answers; they can’t—at least on their own—create knowledge. They certainly can’t tap into the institutional knowledge that keeps your organization humming, the best practices your experts have built and refined, or the hard-won experience you want to preserve for current and future employees.

The good news is that you can build a knowledge base that integrates with your GenAI projects to generate answers that are comprehensive, accurate, reliable, and founded on your teams’ knowledge.

Impressive as they can be, AI-generated answers don’t always meet developers’ need for trustworthy, validated knowledge. On the whole, developer trust in AI tools is not robust. Stack Overflow’s annual survey of more than 65,000 developers found that 42% trust the accuracy of AI output in their workflows while 31% distrust it (around 27% neither trusted nor distrusted the output of AI tools). In particular, developers lack trust in AI tools to tackle complex tasks, and 63% of respondents said AI tools lack the context to understand their organization’s codebase, internal architecture, and store of institutional knowledge.

In spite of the growing number of AI coding assistants and AI-powered productivity tools on the market, developers still spend a lot of time looking for answers to their questions. Our survey found that more than 60% of professional developers spend half an hour or more a day searching for solutions, with one in four spending at least an hour looking for answers. Developers spend a lot of time answering other people’s questions—often the same question more than once. Three out of four developers find themselves answering questions they’ve answered before, while close to half (47%) spend at least half an hour every workday answering questions. For organizations, this means that many of your most knowledgeable, experienced employees have substantially less time and energy for new projects.

AI tools can deliver major productivity gains, among other benefits, but they are no substitute for a knowledge management strategy that effectively utilizes AI while keeping humans in the loop to ensure high-quality output.

The takeaway for organizations is that AI alone is clearly not shifting the burden of sharing and preserving knowledge off developers’ shoulders. AI tools can deliver major productivity gains, among other benefits, but they are no substitute for a knowledge management strategy that effectively utilizes AI while keeping humans in the loop to ensure high-quality output. Organizations building AI solutions need to think about how they can also deliver developer-friendly product experiences by providing users with options when their LLMs don't have sufficient answers. Here at Stack Overflow, this is exactly the challenge we’re building solutions for, both for our communities on our public platforms with partners and in Stack Overflow for Teams, where OverflowAI can get new knowledge documented and in the hands of developers wherever they’re at in their workflow.

As we suggested above, over-relying on AI tools instead of investing in a system to capture, preserve, and share knowledge leads to a vicious cycle. It goes like this:

Let’s say you roll out an AI chatbot for your employees. It generates answers almost instantly; it saves time, but sometimes its answers aren’t always or entirely correct, and there’s no built-in way for users to vet those answers against the established wisdom of subject matter experts or organizational best practices. But the answers are so easy to get, and often useful enough! So employees start making decisions that shape business logic in response to flawed, outdated, or incomplete answers generated by the AI chatbot.

It’s not hard to see where the downstream consequences, from unmanageable tech debt to security vulnerabilities to compliance violations, can enter into the equation. The more your employees rely on AI-generated answers in a vacuum, without a trusted knowledge community to evaluate and contextualize the AI’s output, the more you’ll reinforce this risky cycle.

There are a multitude of use cases for LLMs. From a developer perspective, AI coding tools can add value by automating or expediting repetitive work; they can also help devs successfully navigate the learning curve of new tools or unfamiliar programming languages. But, as we’ve said, many developers still lack substantial trust in the output of AI models.

Once they have an answer from the model, developers tend to continue searching for information to vet the AI’s response—often by looping in a human with domain expertise. If you ask an LLM about a topic you know nothing about, how are you going to assess the accuracy or quality of its responses without confirming them by tracking down the answer of an established human expert?

As we mentioned, close to half of professional developers (45%) feel AI tools are bad or very bad at handling complex tasks. To really learn a new platform, technology, or programming language, developers may need more than the immediate responses of an LLM. But an LLM trained on a well-structured knowledge base can offer much more than superficial replies.

As more use cases emerge for LLMs, it becomes increasingly clear that you can’t build a high-quality AI product that delivers value for developers without schooling your model on high-quality data.

Expert research from MIT has shown that integrating a knowledge base into an LLM improves output and reduces hallucinations (that is, incorrect or misleading results). The “garbage in, garbage out” rule familiar to many developers and technologists certainly applies to LLMs. Your model is dependent on the quality and completeness of the training data you provide.

As many of today’s AI models expand into terabytes upon terabytes, you might assume that a high-quality dataset is also a big one, and vice versa. But a massive dataset isn’t necessary to yield high-quality, relevant results. Microsoft researchers have found that smaller models are able to compete with huge, foundational models on several important benchmarks. Data quality and how the data is structured can matter as much as the size of the dataset.

What does this mean for your organization? Even if you haven’t yet incorporated any AI tools into your workflows, you can lay a foundation for future projects involving GenAI by creating a framework to capture and preserve institutional knowledge. That way you’ll have a robust, well-organized knowledge base ready to train future models, regardless of their size.

Stack Overflow for Teams is built to help create a knowledge-sharing culture at your organization, to capture and preserve institutional knowledge for employees (and AI assistants!) of the future. Our platform now includes AI features like an Auto-Answer app for Slack and Microsoft Teams whose responses grow from reliable, community-validated knowledge your developers can trust.

Interested in learning more? Stack Overflow for Teams can help your developers get more value out of AI tools.

Last updated October 29, 2024

AI answers aren’t knowledge

AI answers alone aren’t sufficient to meet developers’ need for knowledge

Relying on AI answers alone can create a dangerous cycle

Choose your use cases wisely

Embrace quality over quantity

Invest in building knowledge

Related resources

Table of contents