I Thought I Fine-Tuned an AI. I Didn’t. Here’s What Actually Happened.

I built an app that customizes an AI model.

I gave it a name — CodeBot. I gave it a personality. I gave it rules. Always reply in bullet points. Always include code. Always end with "Happy coding!"

And it worked. Ask CodeBot anything and it responds exactly like that. Every time.

For about 20 minutes I thought I had fine-tuned a model.

Then I dug deeper. And I realized I hadn't changed the model at all.

What I Actually Did vs What Fine-Tuning Really Is

Here is the moment that made everything click for me.

My app does this:

const modelfile = `FROM llama3.2:latest
SYSTEM You are CodeBot. Always reply in bullet points with a code example. End with: Happy coding!
PARAMETER temperature 0.3`

const modelfile = `FROM llama3.2:latest
SYSTEM You are CodeBot. Always reply in bullet points with a code example. End with: Happy coding!
PARAMETER temperature 0.3`

I am writing instructions. Giving the model rules to follow.

Real fine-tuning does something completely different. It takes thousands of examples of how you want the model to behave, runs them through a training process on a GPU, and permanently changes the model's weights — the internal numbers that define what the model knows and how it thinks.

My app tells the model how to behave. Real fine-tuning teaches the model — and the knowledge becomes permanent.

The difference is huge. And I only understood it by building the demo and asking why it felt too easy.

What is Fine-Tuning?

Fine-tuning is the process of taking a pre-trained model and training it further on your own data.

The base model — Llama 3.2, GPT-4, Claude — already knows a lot. It was trained on billions of sentences from the internet. Fine-tuning takes that foundation and specializes it.

Think of it like this. A medical school graduate knows general medicine. Fine-tuning is the residency — two more years of specialized training that makes them an expert in one specific area.

After fine-tuning, that knowledge is permanent. It is baked into the model's weights. You don't need to remind it of anything. It just knows.

What fine-tuning is good for:

Making a model respond in a very specific style consistently
Teaching it domain-specific knowledge — medical, legal, financial
Making it follow very precise output formats reliably
Removing or reducing knowledge you don't want it to have

What fine-tuning is NOT:

A way to give the model access to private documents — that's RAG
Something you can do in seconds on a laptop — it needs GPU and training time
Changing the model's system prompt — that's what my app did

The App I Built — A Fine-Tuning Demo

My app does three things:

Asks the base model (Llama 3.2) a question
Creates a custom model (CodeBot) using a Modelfile
Asks CodeBot the same question and compares

The question: "Explain promises like I am 5 years old."

Base model answer — free format, paragraphs, friendly explanation, no particular structure.

CodeBot answer — strict bullet points, code example included, ends with "Happy coding!" Every single time.

The difference is immediately visible. That is the whole point of the demo.

The Code — What Actually Happens

The entire app is two functions and 10 lines of logic.

Function 1 — ask()

async function ask(model, systemPrompt, userQuestion) {
  const messages = [];
  if (systemPrompt) messages.push({ role: "system", content: systemPrompt });
  messages.push({ role: "user", content: userQuestion });

  const res = await fetch(`${BASE_URL}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ model, stream: false, messages }),
  });

  const data = await res.json();
  return data.message?.content ?? JSON.stringify(data);
}

async function ask(model, systemPrompt, userQuestion) {
  const messages = [];
  if (systemPrompt) messages.push({ role: "system", content: systemPrompt });
  messages.push({ role: "user", content: userQuestion });

  const res = await fetch(`${BASE_URL}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ model, stream: false, messages }),
  });

  const data = await res.json();
  return data.message?.content ?? JSON.stringify(data);
}

Sends a system prompt and question to any Ollama model. Returns the answer. That's it.

Function 2 — createCustomModel()

function createCustomModel(name, baseModel, systemPrompt) {
  const modelfile = `FROM ${baseModel}
SYSTEM ${systemPrompt}
PARAMETER temperature 0.3`;

  writeFileSync("Modelfile", modelfile);
  execSync(`ollama create ${name} -f Modelfile`, { stdio: "inherit" });
  console.log(`✓ Custom model "${name}" created`);
}

function createCustomModel(name, baseModel, systemPrompt) {
  const modelfile = `FROM ${baseModel}
SYSTEM ${systemPrompt}
PARAMETER temperature 0.3`;

  writeFileSync("Modelfile", modelfile);
  execSync(`ollama create ${name} -f Modelfile`, { stdio: "inherit" });
  console.log(`✓ Custom model "${name}" created`);
}

This writes a Modelfile to disk and uses the Ollama CLI to register a custom model.

Three lines in the Modelfile:

FROM — which base model to start from
SYSTEM — the persona and rules
PARAMETER temperature 0.3 — lower temperature means more focused, less random responses

One thing to note — the Modelfile is NOT committed to the repo. The app generates it fresh every time it runs. That is by design.

The main logic — three steps

const question = "Explain promises like I am 5 years old.";

// Step 1 - Ask the base model
const baseAnswer = await ask(MODEL, "You are a helpful assistant.", question);

// Step 2 - Create CodeBot
await createCustomModel("codebot:latest", MODEL,
  "You are CodeBot. Always reply in bullet points with a code example. End with: Happy coding!");

// Step 3 - Ask CodeBot the same question
const customAnswer = await ask("codebot:latest", "", question);

const question = "Explain promises like I am 5 years old.";

// Step 1 - Ask the base model
const baseAnswer = await ask(MODEL, "You are a helpful assistant.", question);

// Step 2 - Create CodeBot
await createCustomModel("codebot:latest", MODEL,
  "You are CodeBot. Always reply in bullet points with a code example. End with: Happy coding!");

// Step 3 - Ask CodeBot the same question
const customAnswer = await ask("codebot:latest", "", question);

Same question. Two completely different answers. The difference is the system prompt baked into the Modelfile.

What a Modelfile Actually Is

A Modelfile is Ollama's way to package a customized model. Think of it as a recipe — it says which base model to use, what persona to give it, and how to tune its behaviour.

FROM llama3.2:latest       ← base model
SYSTEM You are CodeBot...  ← persona and rules
PARAMETER temperature 0.3  ← behaviour settings

FROM llama3.2:latest       ← base model
SYSTEM You are CodeBot...  ← persona and rules
PARAMETER temperature 0.3  ← behaviour settings

Important: This is NOT real fine-tuning. The model's weights — the actual learned knowledge — are unchanged. You are just wrapping the base model with instructions.

Remove the system prompt and the model reverts completely to its original behaviour.

Real fine-tuning changes the weights permanently. The knowledge survives even if you delete all prompts.

Fine-Tuning vs RAG vs System Prompt — The Real Difference

This is the comparison I wish I had seen before building anything:

My app is the first column. I gave the model a rulebook.

RAG — which we built in Day 9 — is the second column. Documents are fetched at query time.

Real fine-tuning is the third column. And it looks like this:

What Real Fine-Tuning Actually Looks Like

If you want to truly fine-tune a model, here are the five steps:

Step 1 — Collect training data Create Q&A pairs showing exactly how you want the model to respond.

{ "question": "What is our refund policy?",
  "answer": "We offer 30-day returns on all products..." }

{ "question": "What is our refund policy?",
  "answer": "We offer 30-day returns on all products..." }

Step 2 — Train with a tool Feed your Q&A pairs into a training tool like Unsloth or Axolotl. It runs on a GPU — Google Colab works for free. The tool adjusts the model's weights based on your examples.

Step 3 — Export as GGUF Convert the trained model into GGUF format that Ollama understands.

Step 4 — Load into Ollama

FROM ./my-finetuned.gguf
SYSTEM "You are a customer support agent..."

FROM ./my-finetuned.gguf
SYSTEM "You are a customer support agent..."

Step 5 — Use it

ollama run my-expert

ollama run my-expert

The whole process takes minutes to hours depending on the size of your training data and your GPU.

Issues I Hit While Building This

These are real problems I ran into — worth knowing before you try it yourself:

Problem 1 — Cannot read properties of undefined (reading 'content') Passing an empty string as a system message confused Ollama's API. Fixed by skipping the system message entirely when it's empty — hence the if (systemPrompt) check.

Problem 2 — model 'codebot' not found The model name needed a tag. Changed "codebot" to "codebot:latest" and it worked.

Problem 3 — ollama create silently failing Ollama's /api/create REST endpoint was failing without a clear error. Switched to using the CLI directly — execSync('ollama create codebot -f Modelfile') — and it worked immediately.

Small issues but each one cost time to debug. Sharing them here so you don't have to.

The Thing That Changed How I Think About This

Before building this app I had a vague idea that fine-tuning meant "training an AI on your data."

After building it I understand there is a spectrum:

System prompt — tell the model how to behave. Instant. Reversible. No GPU.

RAG — give the model access to your documents at query time. The model doesn't learn anything permanently.

Fine-tuning — actually change what the model knows. Permanent. Requires training. The knowledge is baked in.

Most things people call "fine-tuning" are actually system prompts. Including what I built.

That is not a criticism. System prompts are powerful and often exactly what you need. But knowing the difference means you choose the right tool instead of reaching for the most complex one by default.

What I Learned Today

Fine-tuning permanently changes a model's weights — the knowledge is baked in
What Ollama does with a Modelfile is NOT fine-tuning — it's system prompt customisation
A Modelfile has three parts: FROM (base model), SYSTEM (persona), PARAMETER (behaviour)
Real fine-tuning requires training data, a GPU, and tools like Unsloth or Axolotl
System prompt = rulebook. RAG = open book exam. Fine-tuning = actually studied.
Most "fine-tuning" people talk about is actually just clever prompting
Skipping empty system messages prevents confusing Ollama's API
Always use model tags — codebot:latest not just codebot

The Code

Full project on GitHub: github.com/PriyankaMali-13/AI/tree/master/finetune-app

Run it. Ask CodeBot something. See the difference yourself.

I'm Priyanka — backend engineer, chatbot builder, and someone learning AI from first principles. Writing everything down as I go. Follow along if that sounds useful.

#365DaysOfAI #FineTuning #Ollama #LLM #AI #NodeJS #LearningInPublic #MachineLearning

Contents