AI Technical Design Decisions: Are You Summarizing or Question Answering

We look at at the difference between summarizing and question answering when using LLMs.

Last updated July 27, 2023

Co Authors: Philippe Beaudette, VP - Community and Michael Foree - Director, Data Science and Data Platform.

We know we are using terms to describe our projects that may not be 100% technically accurate - we’re leaning more towards simplicity in our explanations at this moment and expect our descriptions to evolve as we, the industry and users, learn more.
We are sharing an example of how different technology designs can change the output of AI - in this example, either Summarization or Question Answering.
As we research and build our AI solutions, we’ll solicit input from the community and users to understand how we might better explain the methods used under the hood.

Stack Overflow, like the rest of the world, has been spending a lot of time learning about AI and GenAI over the past few months as we build the solutions we announced today at WeAreDevelopers.

In our announcements, we speak about people getting summarized outputs from AI. In making announcements like these, our Marketing colleagues have to find a balance between explaining our technical solutions simply and explaining them precisely. At this moment, we are leaning more towards simplicity in our explanations and expect our descriptions to evolve as we, the industry and users, learn more.

With AI, the behind-the-scenes technical design decisions - e.g., how we decide to send information to an LLM - can produce a very different user experience and output.

Let’s look at the difference between summarizing and question answering.

Summarization is the task of creating a shorter version of one or more documents that captures all the important information.
Question answering (more specifically, generative question answering) generates a response that correctly answers the user's question.

Both may sound familiar but produce a very different user experience simply because summarization is unaware of the user's query while question answering is.

Below are examples highlighting the differences between the tasks using Stack Overflow posts or an Article.

Note: All examples use a technique known as Retrieval Augmented Generation (RAG), which allows us to leverage our data instead of relying on the knowledge the LLM has of the world.

Differences when using Stack Overflow posts

The following example uses a public Stack Overflow post, but the same logic applies to Stack Overflow for Teams posts as they both are in a Q&A structure.

Let's say a user has a query asking, "How to wait for a minute in Python?". Our search results say, How do I get my program to sleep for 50 milliseconds? is the correct and highest-quality post related to the user's query.

Question answering

For question answering, we provide the LLM with two things. The information, in our case, is the accepted answer for the Stack Overflow post. And the question, which is the user's search query.

Prompt

Use time.sleep()

from time import sleep
sleep(0.05)
----------------
Using only the above information, answer the following question.
Include a concise explanation and code snippets if necessary. 
----------------
Question: How to wait for a minute in Python?

Generation

Answer: You can use the time to wait for a minute in Python.sleep() function. 
This function takes a single argument: the number of seconds to wait. 
To wait for a minute, you would use the following code:

from time import sleep
sleep(60)

Notice how we have used the existing Stack Overflow answer about sleeping for 50 milliseconds as a reference, but the LLM knows the user is asking for a minute instead. Hence, it adjusts the answer to match the original search query.

Summarization

For summarization, we still provide the LLM with two things. The question, now the title of the Stack Overflow post, which was the post returned by search, not the user's search query. Then we provide an answer, still the accepted answer for the Stack Overflow post.

Prompt

Summarize the following question and answer. Be concise and use code snippets if necessary.
----------------
Question: How do I get my program to sleep for 50 milliseconds?

Answer: Use time.sleep()
from time import sleep
sleep(0.05)

Generation

To make a program sleep for 50 milliseconds, use the time.sleep() function from the time module and pass in 0.05 as an argument.

Notice how we are just re-writing the existing answer. The LLM is only aware of the original question posters' question about "50 milliseconds" and has no knowledge of the search query, which asks for "1 minute".

Differences when using Articles or long-form documents

The next example shows how the experience is even more different when dealing with an Article or any other long-form document.

Let's say we have a query asking, "How much is github copilot for students?" and an article chunk was returned by search.

Question answering

Similar to above, we just pass some information, which is the chunk of text from the article, and the question, which is still the search query.

Prompt

{CHUNK}

-----
Using only the above information, answer the following question.
Include a concise explanation and code snippets if necessary.
----

Question: How much is github copilot for students?

Generation

GitHub Copilot is free for students and maintainers of popular open-source repositories.
There is no cost associated with using GitHub Copilot for these users.

Summarization

But for summarization, we only pass the information from the article. The is no question for the LLM to answer because the article is a long-form document, not structured as Q&A.

Prompt

Summarize the following information. Be concise and use code snippets if necessary.

----

{CHUNK}

Generation

GitHub Copilot is a paid feature for individual developers, priced at $10/developer/month or $100/year.
It is free for students and maintainers of popular open-source repositories.
For GitHub Enterprise Cloud customers who are not paying via credit card or PayPal, GitHub Copilot for Business is priced at $19/user/month.
Companies who are on GitHub Free, GitHub Team, or pay for GitHub Enterprise via credit card or PayPal can join a waitlist for GitHub Copilot for Business.

With summarization, the LLM does not know we care only about "students", so it generates much more text. We still can see that "It is free for students and maintainers," but it is still on the user to find this information in the generation, and there is no guarantee that this information was going to be retained.

Conclusion

While both summarization and question-answering can be extremely helpful, it's important to acknowledge their differences. Summarization has no knowledge of the search query, while question-answering does. Neither is better than the other, just different implementations and experiences.

Summarization is less likely to hallucinate and produce inappropriate content because the user does not influence the generation. Still, there is the possibility that they need help getting the answer they are looking for.

Question answering is more likely to answer the user's query and get them unstuck. It is more likely, however, to hallucinate and produce inappropriate content because the user has some influence on the generation. This risk can always be reduced but is still present.

As we research and build our AI solutions, we’ll solicit input from the community and users to understand how we might better explain the methods used under the hood.

About Labs

Since 2008 Stack Overflow has pioneered open source conversations in the technology community, helping us become the most visited, most trusted destination for developers in the world. In 2017, we unleashed the same productivity gains inside companies with Stack Overflow for Teams.

Knowledge sharing between peers and experts is fundamental to software development ‒ you can see it happening in Slack, in meetings, or quick hangs. Advances in technology, like GenAI, puts everyone in learning mode and knowledge sharing is at the core of that experience.

Stack Overflow for Teams sits at the very intersection of curiosity and innovation, a place to ask & answer your peers’ questions, learn from other experts within the company, and keep up with ‒ or be the driver of ‒ all new developments.

Our guiding principles

Find new ways to give technologists more time to create amazing things.
Accuracy is fundamental. That comes from attributed, peer-reviewed sources that provide transparency.
The coding field should be accessible to all, including beginners to advanced users.
Humans should always be included in the application of any new technology.

With these in mind, starting over the next few months, we will be sharing our ideas, opinions, designs, research and product ideas which combine emerging technologies with our platforms and services.