3. Building your GenAI tech stack

Considerations: Cost, scale, security and more

Weighing the options available to your GenAI program can be pretty daunting, so let’s talk through the major factors you’ll need to consider when making your decisions. For a technology that only really broke through into mainstream consciousness in November 2022, the ecosystem has grown surprisingly rich and accessible. You’ll find that there are plenty of good options available, both open-source and proprietary, locally installable and SaaS.

Model size

You may have seen large language models described in terms of number of parameters—that’s the size. The largest models have up to trillions of parameters, and as they grow, they have proven to improve their capabilities and gain emergent abilities. Massive models can tackle a wide range of tasks very well.

But bigger isn’t always better. As Microsoft has shown with its series of Phi models, a smaller model trained on precise data can often perform on par with much larger models on some tasks, sometimes even besting them. Rather than train on a huge corpus of code and text from the internet, these models were trained on a hand-picked subset.

The tradeoff you should consider is between a widely capable but expensive model and a targeted one that fits your budget but may not return great results for every task.

There’s another aspect of size to consider: the size of each parameter. More accurate numbers take up more memory. Models may be trained with 32-bit floating point parameter values, but for storage-conscious applications, you can quantize them down to 8-bit integers. Think of it like reducing the amount of available colors in an image—your beautiful 64-bit PNG may be what you print for posters, but the 8-bit version works fine in a thumbnail image. You may get less quality from your results, but the size reduction may be what you need for mobile or IoT applications.

Cost

For companies with a lot of resources, there may be good reasons to train your own foundation model or build your own tools. But for most organizations, this requires talent, time, and money that is better invested elsewhere. Training a new model can be as much as 1000x as expensive as fine-tuning an existing one.

Plenty of organizations have released open-source models under Apache or MIT licenses. You can clone these repos and start using and modifying them to your needs. That’s the cheapest option, but it may require more work on your end. Most cloud providers have LLM products that you can easily add to your accounts. For easy access with a cloud subscription, some companies allow API access to LLMs.

You’ll need a significant tech stack investment for GenAI, as there’s a whole data platform and infrastructure component. If you’ve already got a data team and they’ve created a pipeline, you can build on that foundation. From there, see how valuable the results are, and consider scaling or investing more depending on how much farther you want to go.

Scalability

As your application grows, so does your GenAI usage. That means you’ll need to add capacity to your infrastructure, and you’ll spend more on GenAI services. Some GenAI pricing models shift as your usage grows, so this may be something you’ll need to consider up front before committing to an AI vendor.

The flip side of adding capacity to handle scaling is limiting user requests. While this isn’t always the most user-friendly approach, it may be your best option, especially if you become a victim of sudden success. You can build on a variety of LLMs as a hedge, and either shunt excessive traffic to cheaper options or offer a tiered pricing model.

Security

Like every software application, GenAI has security concerns. But if you’ve seen exploits where users convince an AI to sell them a car for one dollar, you know that these security concerns are different from traditional cybersecurity risks, as they can be unpredictable and baked into the LLM’s usage.

From user @ChrisJBakke on X (formerly Twitter), claiming he “just bought a 2024 Chevy Tahoe for $1.”

The security research organization OWASP has created a list of the top ten security issues for LLMs, which includes things like the prompt injection above and not validating outputs. Other security issues can target the training data, supply chain, and uptime of the LLM. Understand these unique issues and ensure that you have controls in place (or use a provider that does that for you).

Updates

Once a model has been trained, its knowledge and understanding of language is fixed in time. You may have seen some folks mine comedy out of asking an AI about current events outside of its training data. To avoid having your LLM grow stale over time, you’ll need to augment its training data by fine-tuning the parameters on new data, using retrieval-augmented generation on a knowledge base, or both. The precise techniques to use will depend on your use case and resources.

Also in play is a concept called model drift. Over time, an LLM can become less accurate, whether because the training data no longer accurately represents the concepts used in practice or the current dataset has changed. You can try fine-tuning a model continuously, but some recommend starting over and retraining your model on the newest data. For open-source models, you’ll have to do this yourself, while managed models may do this for you (for a fee, most likely).

The GenAI landscape is vast and complicated with many different options and risks. You’ll need to think through your use cases and decide which qualities of your software are most important to your customers.