When GenAI-based chatbots and demos first appeared, they convinced many people that large language models could solve real-world challenges out-of-the-box. However, despite generative AI’s potential, trying to use these models for production quickly showed that this was not the case - these models required further development to meet the quality, cost and latency requirements of real-world use cases. 

Gartner reflected this in its 2023 Hype Cycle report for AI, where it placed GenAI at the top of its 'Peak of Inflated Expectations' (which immediately precedes the 'Trough of Disillusionment'). Their AI Primer in 2024 stated “While GenAI models have made unstructured data analysis easier, it still requires effort to choose and train models, feed them with the right enterprise data and create a feedback loop from customer and employee user interfaces for ongoing learning.”  

We therefore call these large, pre-trained generative AI models foundation models (FMs). They serve as a starting point, a foundation to build on. To reach enterprise production quality, data science teams need to customize the LLMs and their surrounding components for the company’s bespoke objectives. This process rests almost entirely on identifying, isolating, cultivating, and preparing the right data to feed these models.

Data makes the difference 
The main lever to continue customizing and building on these foundation models is via the data that is unique to each task and organization. This process of data development requires several steps such as labeling and curating the right type of data that will lead to optimal model performance. Moreover, enterprise data scientists can fine-tune an LLM by supplying the model with prompts relevant to the organization’s needs and responses that are relevant to and correct for those prompts. Data scientists also can target components surrounding the LLM, such as the embedding model for the application’s vector database or supplementary models that check the final output for accuracy, toxicity, or other important attributes.

Some of the largest developers of large language models employ hundreds of people just to create and label data. Most enterprises don’t have that luxury. To optimize their applications’ performance to their needs, enterprises must employ ways to amplify expert guidance and develop their data at scale. 

GenAI in enterprises
As we work with our enterprise customers to develop production-ready AI applications, here are some best practices we’ve observed and shared with data leaders: 

  1. Understand the downstream business use case you’re trying to solve. It’s sometimes tempting to treat your GenAI as a hammer and every use case as a nail. Better to ask yourself: What are we optimizing for here? Where in the overall ML system would generative AI have the most impact? How will you evaluate and measure how accurate or effective generative AI is for your downstream task.
     
  2. Examine the impact of using models not trained on organization-specific data. Is it acceptable that some data used to train these large language models can impact your downstream use case? Is there domain and organization specific knowledge that your model has to be aware of to be performant? We’ve seen examples of bias and flat-out incorrect information from LLMs used for domain specific tasks across various verticals, so it’s helpful to be cognizant of all the associated risks  
     
  3. Recognize that LLMs are typically just one part of an application. Some generative AI applications require data retrieval to ensure that the response lies in line with business objectives and values. One project we worked on achieved a 54% gain in accuracy by optimizing data retrieval and prompt templates before fine-tuning the LLM at all.


The future of generative AI is foundation + specialization 
Even though today’s GenAI models don’t work on all enterprise tasks with one click, foundation models open up many new possibilities. They enable things that haven’t been possible and comprehensible before in a significantly shorter amount of time. For improved performance, these models need additional nuanced, domain-specific, organization-specific, and control-specific knowledge, but today use cases can be built out in a matter of days versus multiple years. 

 As companies demand more value from GenAI, we predict that data scientists will build smaller, specialized models that are built on top of FMs. For example, a bank may have one model for Know Your Customer, one for loan underwriting, one for fraud detection, and so on. If the full-sized FM achieves production-grade accuracy on these tasks, the data scientists can use distillation methods to transfer that predictive power into a smaller footprint. This will allow a company to take all the knowledge from the large, organizational model, but then also realize value based on the faster inference, customization, and specialization possible for a specific use case. 

Building a performant, organization-specific foundation model as well as smaller, specialized models hinges on enterprises’ unique data and domain expertise. That’s why you want your foundation model to embody all the knowledge that’s present in your unique data, so that you can take advantage of it for your downstream tasks. 

Data development is key
Domain expertise is ultimately what delivers value in an information economy, and GenAI is no exception. In the case of specialized GenAI models, strong data leadership will be the key for organizations to take advantage of FMs while customizing them for their objectives and data. When you place your organization’s subject matter expertise at the heart of your GenAI model, you can put your data to work in a new, vastly more efficient way. This approach gives you the best chance to make good on the vast potential of generative AI. The good news, regardless of a few gloomy forecasts, is that the tools, the data expertise—and of course, the data itself—are only getting better every day.