When embarking on the journey of creating a chat solution powered by generative AI, efficiency should be our guiding star. From leveraging SaaS solutions to constructing a model from the ground up, the options are diverse. In this article, I walk through the fine-tuning of a model based on the Azure OpenAI service, focusing on balancing customization with efficiency.
Note: The input data shown in this article has general medical-related questions and answers with the intent of developing a model incorporating some information from the medical field. As I am not a medical professional, I'm using this data only in the service of my data science work that is the subject of this article, and not to suggest or indicate the definition, prevention, diagnosis, or treatment of any medical condition or disease.
Prioritizing efficiency means it's not always necessary to opt for the largest, most powerful model available. Customization often allows us to achieve superior performance with a lighter version of a model. For this reason, I've chosen GPT-4o mini, the smaller sibling of the renowned GPT-4o model. (In the diagram below, we can use it at two stages: prompt engineering and AOAI Model FT.)

Why is fine-tuning a smart choice?
You may already be familiar with the concept of few-shot learning, in which Machine Learning models are trained on a very small number of labeled examples, increasing efficiency. Fine-tuning is another — albeit separate — approach that prioritizes efficiency, and offers several advantages over few-shot learning:
- Quality results: Fine-tuning enables training on a larger dataset than can fit in a single prompt, leading to higher quality outputs.
- Token efficiency: Once deployed, a fine-tuned model reduces token consumption because not all data needs to be passed as input context.
- Lower latency: Fine-tuned models can handle requests faster, providing a more responsive user experience.
Even though the focus of this article is on a fine-tuning approach, it should not be the default or first approach to solving these scenarios. Consider prompt engineering and Retrieval-Augmented Generation (RAG) solutions first, as the training phase for fine-tuning can be computationally intensive and costly.
Model fine-tuning
While the fine-tuning process may seem challenging at first, breaking it down into manageable steps can simplify the journey. Here's a streamlined procedure to guide you through:
- Deploy base model
- Prepare input data
- Upload files
- Fine-tune model
- Deploy fine-tuned model
- Test model
I walk through each of these in turn below in the rest of this article.
1. Deploy base model
We are going to start deploying our GPT-4o mini model from the Azure OpenAI service within Azure OpenAI Studio by going to Deployments > Deploy model > Deploy base model.



2. Prepare input data
The second step is to upload the data that is going to be used for the training. It is important to provide at least a few thousand examples to get higher quality results. The uploaded files must adhere to a specific format, namely jsonl-encoded with UTF-8 and a byte-order mark (BOM). Given that most training data sets aren't readily available in this format, I'll include an extra step to help you convert your data from a .csv file.
As I mentioned earlier, my input data has general medical-related questions and answers with the intent of developing a model incorporating some information from the medical field. Here's a sample of the input data:

The training data must be in the format of question/answer pairs as the model has to know how to answer.
Start by executing the following Python code locally to convert from .csv format to .jsonl:
import pandas
import json
file_name = "qna-medicaldata"
df = pandas.read_csv(file_name+".csv", header=None)
system_message = "You are an AI assistant expert in medicine that answers medical questions."
structured_message = json.dumps("")
for index in df.index:
if index == 0:
continue
question = df.loc[index][0]
answer = df.loc[index][1]
row = '{"messages": [{"role": "system", "content": "'+ system_message +'" }, {"role": "user", "content": "'+ question +'" }, {"role": "assistant", "content": "'+ answer +'" }]}\n'
row_json = json.dumps(row)
structured_message += row
with open(file_name+".jsonl","w", encoding="utf-8-sig") as file_handle:
file_handle.write(structured_message[2:len(structured_message)]) Every row must have the structure: {"messages": [{"role": "system", "content": "'+ system_message +'" }, {"role": "user", "content": "'+ question +'" }, {"role": "assistant", "content": "'+ answer +'" }]}
It is also important to set the encoding of the final file as UTF-8-sig to follow the requirement of UTF-8 with a byte-order mark (BOM) format.
After the preprocessing, the input data is ready to be uploaded with the adequate format and looks like:
{"messages": [{"role": "system", "content": "You are an AI assistant expert in medicine that answers medical questions." }, {"role": "user", "content": "What causes High Blood Pressure ?" }, {"role": "assistant", "content": "Changes in Body Functions Researchers continue to study how various changes in normal body functions cause high blood pressure. The key functions affected in high blood pressure include — kidney fluid and salt balances — the renin-angiotensin-aldosterone system — the sympathetic nervous system activity — blood vessel structure and function. (…)" }]}
{"messages": [{"role": "system", "content": "You are an AI assistant expert in medicine that answers medical questions." }, {"role": "user", "content": "How to prevent Diabetes ?" }, {"role": "assistant", "content": "Your weight affects your health in many ways. Being overweight can keep your body from making and using insulin properly. It can also cause high blood pressure. If you are overweight or obese, choose sensible ways to reach and maintain a reasonable body weight. — Make healthy food choices. What you eat has a big impact on your weight and overall health. By developing healthy eating habits, you can help control your body weight, blood pressure, and cholesterol. (…)" }]}
{"messages": [{"role": "system", "content": "You are an AI assistant expert in medicine that answers medical questions." }, {"role": "user", "content": "Who is at risk for Osteoporosis?" }, {"role": "assistant", "content": "The more likely you are to fall, the higher your risk for a fracture. And more severe falls increase your risk for fractures. Factors that increase your risk of falling and of fracturing a bone include — decreased muscle strength — poor balance — impaired eyesight — impaired mental abilities — certain medications, such as tranquilizers and muscle relaxants — hazardous elements in your living environment, such as slippery throw rugs and icy sidewalks. (…)" }]}
Ensuring your data is properly formatted and abundant paves the way for creating a robust, specialized model.
3. Upload files
We can now return to the Azure OpenAI Studio and to Data files > Add data > For fine-tuning and upload our newly created file:



4. Fine-tune model
Once the data is uploaded, we proceed with the fine-tuning step at Fine-tuning > Fine-tune model.

We select the model to fine-tune (only models that can be fine-tuned are listed).

We select the model version to fine-tune and the suffix that that the model is going to have to identify it easily.

We select the training data previously uploaded.

Optionally, you can upload validation data and customize four hyperparameters: batch size, learning rate multiplier, number of epochs, and seed.


The model now begins the execution, taking some time during which the status moves from Queued to Running to Completed.



When complete, you can open the model and check its status and details such as the training metrics (loss and token accuracy evolution).


5. Deploy fine-tuned model
We're almost there — we have one last step, namely, to deploy the model. This is because although we have created the model, it is not ready to be used unless we deploy it. Let's go to Deployments > Deploy model > Deploy fine-tuned model and repeat the process as if it were a base model.


6. Test model
Once the model is ready, it is listed as any other deployed model.

Let's visit the Chat playground to see if our model is ready to be consumed:

I am going to ask a few questions to see whether it has knowledge of what glaucoma is and how it may be treated:



Now, let's evaluate how differently it behaves if we use a RAG architecture instead of a fine-tuned model. As you can see, a traditional RAG without fine-tuning lacks precision and accuracy explaining the results based on the provided data.



Conclusions
Our analysis reveals distinct differences between the responses generated by a fine-tuned model and those from a RAG approach. The fine-tuned model delivers answers in a specific manner, closely mirroring the question-and-answer patterns from its training data. This enables the fine-tuned model to function as something of an expert in its field. Conversely, the RAG approach generates answers based on the example data provided, which may not always be as specialized or consistent — but allows checking of the referenced data.
For scenarios involving extensive historical medical data, RAG is likely to be less practical. The fine-tuned model, with its capacity for specialization, offers a more feasible solution for such complex datasets.
For a detailed, step-by-step guide on customizing a model, refer to the official documentation: Customize a model with Azure OpenAI Service — Azure OpenAI | Microsoft Learn
I would be happy to hear your ideas about how you have fine-tuned your models and the different behaviors that result. You can leave comments in the Comments section associated with this article.
Eva Jurado Cortés is on LinkedIn.
