Better together: Getting the most value from AI code generation tools

There’s no arguing that we’re in the midst of an AI boom. New tools, capabilities, and skill sets are emerging, and a massive upskilling wave is underway for both individuals and organizations. Here at Stack Overflow, we’re hard at work building tools that leverage generative AI (GenAI) to make developers more productive and support their continuous learning. And less than two years after ChatGPT took the industry by storm, AI coding assistants are already changing the way code is written.

It’s easy to get swept up in the enthusiasm around AI right now—positive and negative. AI coding tools like GitHub Copilot can be huge productivity boosters for seasoned programmers who understand what they’re getting and how to evaluate it. They’re also an excellent source of support for novice programmers or those picking up a new language. But they require expert guidance to perform complex programming exercises and have been shown to introduce coding errors and security flaws when not managed carefully.

In this article, we’ll delve into the advantages and drawbacks of AI code generation tools, explain why a knowledge community is essential for successfully incorporating AI into your technical workflows, and unpack how AI coding tools combined with a knowledge community can unlock new levels of developer productivity.

But first, check out our latest video that explains why you need Stack Internal when using AI-powered code generation tools:

AI can help developers work faster and better by eliminating toil and freeing up headspace and calendar space for higher-order work. For brand-new developers, AI considerably lowers the barrier to entry. For more experienced devs, AI makes it easier to add new languages and skill sets to their repertoires without interrupting their flow state.One key caveat to keep in mind: AI can generate code, but it can’t use its judgment to determine whether that code will fit the need and work as intended. AI doesn’t come out of the box understanding the historical context behind your architecture decisions or the particular requirements of your codebase. (Though it can make it easier for your employees to answer those questions.) Nor can AI understand the range of possible input parameters and select the optimal algorithm for what you need.

AI can generate a first draft. As any writer can tell you, that’s a lot better than a blank page. But a first draft is not a final draft. Humans need to assess the AI’s output, and knowledge management and sharing practices are central to making those assessments.

“I'm very bullish on very good developers augmenting with AI,” said William Falcon, an AI researcher and creator of PyTorch Lightning, a lightweight PyTorch wrapper built for AI research. “I'm not super bullish on newish developers augmenting with AI because they tend to just get lied to by the model.” As an experienced developer, Falcon said, “I know when [the code the model generates] is good or bad because I know how it’s supposed to be done. But if you’re a new developer, you’re just going to copy it.”

I'm very bullish on very good developers augmenting with AI. I'm not super bullish on newish developers augmenting with AI because they tend to just get lied to by the model.
William Falcon, creator of PyTorch Lightning

That’s not to say that AI-powered code gen tools aren’t for newcomers. They help demystify coding and bring new hires up to speed more quickly as they enter real-world coding situations, shortening their time-to-value as new developers. “On newer developers,” Falcon said, “I think if you can teach them to use [AI] as a way to mentor them, then [they can] get caught up in a new system faster. If you had a developer who joined [an organization] and didn’t use it versus one who did, how much quicker were they able to know the system and be productive?”

Matt Van Itallie, founder and CEO of Sema, a company that assesses code to improve outcomes for developers, companies, and users, echoes Falcon’s view. “I’m both incredibly bullish about the power of GenAI in the SDLC, but also as bullish, even more bullish, about developers’ critical role to make sure that code is right,” he says.

I’m both incredibly bullish about the power of GenAI in the SDLC, but also as bullish, even more bullish, about developers’ critical role to make sure that code is right.
Matt Van Itallie, founder and CEO of Sema

Among organizations that have made AI code generation available to their developers, many have seen adoption rates stall and even decline, as it becomes clear that these tools aren’t a magic solution to developer pain points.

From a business perspective, the goal of an AI code generation tool is to raise the average skill level of a development organization. That won’t happen if only already high-performing developers are using the tool to generate value that might be marginal anyway. But a community-centered knowledge-sharing platform like Stack Internal can enable developers to make better use of AI tools available to them—meaning that more higher-quality code makes it to production in less time.

For programming tasks that require creativity, awareness of organizational best practices, and the ability to exercise judgment shaped by experience, AI coding tools are still no substitute for human developers backed by community-vetted knowledge. That’s why we’re building tools that harness the power of GenAI to serve validated content to developers in a seamless, intuitive way that doesn’t require them to switch between tabs or systems.

A Microsoft study of more than two dozen professional software engineers found that their processes and tools were not keeping pace with “the challenges and scale involved with building AI-powered applications.” The interviews in the study showed that AI is a powerful tool in a developer’s arsenal, but also emphasized “the unpredictable nature of the models.” As one participant said, “Because these large language models are often very, very fragile in terms of responses, there’s a lot of behavior, control, and steering that you do through prompting.”

The AI’s output is nondeterministic (meaning AI can provide different outputs in response to the same input on different runs) and difficult to test. In simple terms, you don’t know where the model is getting its information or how it arrives at its conclusions. That makes it “very hard to scale,” said PyTorch Lightning creator Falcon. Developers working with AI, he said, need to learn how to work with nondeterministic systems that may require a significant investment in data science and machine learning in order to become stable, repeatable, and scalable.

“I've deployed systems before I was in AI as a software engineer, and deploying a regular web app is not terribly complicated,” Falcon explained. “You have microservices, you do horizontal scaling, you beef up instances when you need to, but in AI, it doesn’t work that way because AI has different patterns that you don’t have in regular software. Your code could work, but the model could still crash because there’s a gradient or something weird. There’s math involved, maybe your math is wrong, the data is wrong. So it’s less deterministic than software.”

AI can’t exercise judgment, but a community of humans can. Building on the foundation of your community's knowledge and the organizational context you've captured for coding, you can build an AI-supported workplace.

Imagine that you could bring contextual knowledge from your community into your IDE to illuminate what you’re doing and help you make critical decisions about your code. You could identify and vet sources, lift the lid on the thought process, and make sure you’re acting on the most up-to-date information. Stack Internal provides institutional context to developers, and the Stack Overflow IDE extension coming soon puts that knowledge directly into your IDE.

As we’ve written, AI models trained on an up-to-date and well-organized knowledge base tend to deliver more accurate, complete, and relevant answers. Research out of the MIT Media Lab has found that integrating a knowledge base into a LLM improves output and reduces hallucinations—incorrect results, from simple inaccuracies to wholly invented answers. Garbage in, garbage out.

For your organization to get the most value from your AI coding tools, your knowledge base must be:

Accurate and trustworthy, with information verified by expert users.
Capable of capturing the context in which questions are asked and answered.
Easy to refresh as new data and use cases emerge.
Continuously improving and self-sustaining.

Working with colleagues to vet the AI’s responses, provide feedback, and refine prompt structure is an essential part of incorporating AI coding tools into your workflow, which is why a culture and practice of knowledge-sharing is so important. Reinforcement learning with human feedback, in which people apply their judgment to the AI’s output to coax it toward better results, mirrors the trial-and-error process by which humans learn. With a knowledge-sharing community built around a robust, well-structured knowledge base, your developers will be able to assess the AI’s responses and coach it to improve.

Once you’ve built your knowledge-sharing community, you can start coding with organizational context. AI code generation tools essentially function as a pair programmer who can structure a developer’s next line of code without them even having to think about it. This allows your developers to spend less time writing repetitive, boilerplate code. Richer organizational context leads to better output from the AI, allows developers to validate that output, and helps them figure out where to go next.

In an ideal world, AI-powered coding tools would help developers write more secure code. In reality, though, relying on AI assistants tends to produce buggier, less secure code. A recent Stanford study found that programmers with access to an AI assistant wrote “significantly less secure code” than programmers who didn’t use AI, but that programmers with access to an AI assistant were more likely to believe they were writing secure code—a revealing blind spot.

The study further found that “participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities.” In other words, developers with the ability to critically evaluate the quality of the AI’s output and shift their prompts in response produced more secure code.

Combining AI code generation tools with a knowledge community not only accelerates the code-writing process but also helps developers get to higher-quality solutions faster. A robust knowledge community lowers barriers to innovation by dismantling silos and encouraging the adoption of standardized policies, best practices, and new technologies.

Stack Internal fosters internal, technically-focused knowledge-sharing communities that connect employees with the answers they need. The platform makes institutional knowledge easily available, searchable, and reusable. It captures code-adjacent metadata like policies, procedures, and best practices to help developers answer questions like:

Why did we make this architectural decision across our products?
What technologies do we use to power our website?
Do we have a standard Docker file we use to create containers?
Do we have full observability on our site (traces, metrics, events) or are we just using logs?

Because Stack Internal captures institutional knowledge along with your organization’s preferred way of working, individual developers can get expert insight and help work through problems in a way that’s reusable for other employees throughout the organization. Subject matter experts (SMEs) can document their expertise for future users, building a repository of trustworthy, community-vetted knowledge. The platform also supports voting and reputation, so users can rank the most helpful, relevant, and up-to-date answers, making them more easily discoverable for everyone.

Using Stack Internal alongside your AI code generation tool allows your developers to answer the “what,” “how,” and “why” questions they encounter as they work. When developers don’t have to pause their work and switch platforms to find answers to their questions, they’re more productive in their coding environments. As a result, your code makes it to production faster, in compliance with company standards and practices.

For visual folks, here’s a video to illustrate how Stack Internal can help your developers get the most out of AI code gen tools:

An informed community with ready access to knowledge is crucial for quality engagement with any AI technology you bring into your organization. Building on your foundation of community knowledge and the organizational context you’ve captured helps you get the most value from your AI tools while helping to minimize the inherent risks in adopting a new technology. Together, AI code generation tools and Stack Internal can help you improve code quality, security, and developer productivity at a holistic level.

Better together: Getting the most value from AI code generation tools

AI can generate code, but it can’t exercise judgment

Flagging adoption rates tell a story

AI is a force magnifier—but it’s unpredictable

Better together: AI meets a knowledge community

Knowledge health

Reinforcement learning with human feedback

Code in context

Security risks

How Stack Internal can help with holistic developer productivity

A recipe for success

Related resources

Table of contents