Given we have an application used by internal users and we want to automate some parts of it using LLM models the following questions arise:

  1. Where should we keep the prompts? In the project repository or in the database?
  2. How can we version our prompts? If we keep the prompts hardcoded in the app code, can we easily switch versions?
  3. How can we receive and store feedback to evaluate how the model performs in the real world?
  4. What about A/B testing? We would like to see how newer models perform before routing all traffic through them.

These are the first questions that came to mind when I was thinking about integrating OpenAI models into the application. For the proof of concept, my team and I hardcoded the prompt.

Some other features that would be a nice addition to the system include:

  1. Having prompts that respond synchronously or asynchronously (via webhook or queue).
  2. When logging the prompt execution in the database, we only store the prompt, version, and input data. With the input data, we can replay other prompt versions to assess their performance.

In the next article, I will continue with a more applied approach to the subject.