I Tried New Claude Code Ollama Workflow ( It’s Wild & Free)

Claude Code now works with Ollama, which takes the game to the next level for developers who want to work locally or need flexible model options.

I've been testing this integration since the announcement, and the possibilities are impressive. I made all the mistakes that will cost you time and documented them here to help you know what works and save time!

If you are not a premium Medium member, read the full tutorial FREE here and consider joining medium to read more such guides.

Just in case you are new to Ollama, it's a tool that lets you run large language models on your local machine.

It's like a router between AI models and your development environment, without sending every request to the cloud.

This is an ideal workflow for privacy-conscious projects, air-gapped systems, or when you want to avoid API costs, which is a concern for many developers.

Ollama v0.14.0 and later versions are compatible with the Anthropic Messages API. This means Claude Code can now works with any Ollama model.

You can run Claude Code with local open-source models on your machine, or connect to cloud models through ollama.com.

I wanted to test this to see how it performs.

The ideal workflow is using Claude Code's agentic capabilities with the model that fits your needs, whether that's a local model or a cloud model for complex tasks.

In this post,

I will walk you through how I set this up and what I discovered while testing it the first time.

Setting Up Claude Code Ollama Integration

Getting Claude Code to work with Ollama requires three things:

Claude Code installed
Ollama running
Right environment variables configured.

I'll walk you through each step because the setup is simple once you understand what each piece does.

1) Installing Claude Code

If you don't have Claude Code yet, the installation is easy.

For macOS, Linux, or WSL:

curl -fsSL https://claude.ai/install.sh | bash

curl -fsSL https://claude.ai/install.sh | bash

For Windows PowerShell:

irm https://claude.ai/install.ps1 | iex

irm https://claude.ai/install.ps1 | iex

For Windows CMD:

curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

The installer handles everything, and once it completes, restart your terminal to ensure the claude command is available in your path.

When installation is complete, verify using this command claude --version

2) Installing Ollama

Before you can use Claude Code with Ollama, you need to have Ollama installed on your system.

For macOS and Linux:

Download and install from the official site:

curl -fsSL https://ollama.com/install.sh | sh

curl -fsSL https://ollama.com/install.sh | sh

Or visit ollama.com/download to get the installer for your platform.

For Windows:

Download the Windows installer from ollama.com/download and run it.

The installer will set up Ollama and add it to your system path.

After installation, verify Ollama is working:

ollama --version

ollama --version

Ollama runs as a background service automatically after installation. You should see it running on [http://localhost:11434](http://localhost:11434.).

To check if the service is running:

That confirms Ollama is installed and running.

The empty list means you haven't pulled any models yet, which is expected.

Now you can proceed with pulling your first model.

Try this first model pull:

ollama pull qwen3-coder

ollama pull qwen3-coder

This will download the qwen3-coder model. It'll take a few minutes, depending on your internet speed, since it's downloading several gigabytes.

That's a huge one; it requires about 18GB of space. If you want to try out with another smaller model, you can go with these options:

Qwen2.5-coder:7b (around 4.7GB) — Good balance for coding:

ollama pull qwen2.5-coder:7b

ollama pull qwen2.5-coder:7b

Starcoder2:3b (around 1.7GB) — Compact coding model:

ollama pull starcoder2:3b

ollama pull starcoder2:3b

Qwen2.5-coder:1.5b (around 1GB) — Lightweight option:

ollama pull qwen2.5-coder:1.5b

ollama pull qwen2.5-coder:1.5b

Deepseek-coder:1.3b (around 776MB) — Smallest coding model:

ollama pull deepseek-coder:1.3b

ollama pull deepseek-coder:1.3b

For this setup, let's go with the smallest coding model :

Once that completes, run ollama list Again, you should see Qwen3-coder or any other model that you have pulled into the list.

You can also launch Ollama as a GUI app, where you can access the model you just pulled and can download other models as well.

Now that we have the model pulled, let's configure the variables and test the Claude Code Ollama workflow.

3) Configuring Environment Variables

In this step, we configure Claude Code to talk to Ollama instead of the default Anthropic API.

You need two environment variables:

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

For PowerShell, use this:

$env:ANTHROPIC_AUTH_TOKEN="ollama"
$env:ANTHROPIC_BASE_URL="http://localhost:11434"

$env:ANTHROPIC_AUTH_TOKEN="ollama"
$env:ANTHROPIC_BASE_URL="http://localhost:11434"

To verify they're set:

echo $env:ANTHROPIC_AUTH_TOKEN
echo $env:ANTHROPIC_BASE_URL

echo $env:ANTHROPIC_AUTH_TOKEN
echo $env:ANTHROPIC_BASE_URL

Note: These are temporary and only last for your current PowerShell session. If you want them to be permanent, you can add them to your PowerShell profile or set them as system environment variables through Windows Settings.

The ANTHROPIC_AUTH_TOKEN is set to "ollama" because the API key is required by the SDK, but Ollama doesn't validate it.
The ANTHROPIC_BASE_URL points to your local Ollama instance running on port 11434.

You can set these temporarily in your current terminal session, or add them to your shell profile for permanent configuration.

I prefer keeping them in my .zshrc or .bashrc so they're always available.

Remember :

Before running Claude Code, you need at least one model pulled locally.

4) Running Your First Command

Once the model is pulled, you can launch Claude Code with Ollama:

claude --model deepseek-coder:1.3b

claude --model deepseek-coder:1.3b

Claude Code starts up and connects to your local Ollama instance.

You'll see the familiar Claude Code interface, but now it's running through your local model instead of Anthropic's API.

I tested this by asking it to write a simple Python function.

 API Error: 400
    {"type":"error","error":{"type":"invalid_request_error","message":"registry.ollama.ai/library/deepseek-coder:1.3b
    does not support tools"},"request_id":"req_bf347cdc82bc7ea0dddd4f0d"}

 API Error: 400
    {"type":"error","error":{"type":"invalid_request_error","message":"registry.ollama.ai/library/deepseek-coder:1.3b
    does not support tools"},"request_id":"req_bf347cdc82bc7ea0dddd4f0d"}

I then hit this error and remembered an important detail :deepseek-coder:1.3b model doesn't support tool calling, which Claude Code requires to function.

For this to work, we need to use a larger model that supports tools.

Let's go with qwen2.5-coder:7b instead:

ollama pull qwen2.5-coder:7b

ollama pull qwen2.5-coder:7b

That's about 5 GB in size, and once that's downloaded, try:

ollama list

ollama list

You should see your newly pulled model added to the list

Now it's time to launch Claude Code using this new model to see if it works, run using this command:

claude --model qwen2.5-coder:7b

claude --model qwen2.5-coder:7b

The 7b models generally support tool calling, which is essential for Claude Code to read, modify, and execute code in your working directory.

I ran the previously failed prompt using this new model :

It's at this point that I discovered my hardware limitations; the 7b model was struggling on my hardware — 5+ minutes is way too long.

It was time to try the Ollama Cloud option since I had hit the wall here!

But first, here are some things to note about working with a local setup

Working with Local Models

Keep in mind that smaller models like this 1.3b version won't match the quality of larger specialized coding models, but they're perfect for testing the integration and handling straightforward coding tasks.

Understanding Context Length

Claude Code works best with models that have at least 32k tokens of context.

Smaller models often have limited context windows, which can cause issues with larger codebases.

You can check and adjust context length in Ollama's configuration.

The cloud models from ollama.com always run at their full context length, so this is mainly a consideration for local models.

With smaller models, the context limitation is more noticeable. For complex projects with many files, you might want to consider using a larger model or switching to cloud models for better performance.

Running Claude Code with local models gives you complete control over your development environment. No API costs, rate limits, and your code never leaves your machine.

Here's how different local models work with Claude Code and what to expect.

Understanding Model Options

qwen2.5-coder:7b is trained for coding tasks. You can pull it with:

ollama pull qwen2.5-coder:7b

ollama pull qwen2.5-coder:7b

Then run Claude Code:

claude --model qwen2.5-coder:7b

claude --model qwen2.5-coder:7b

This model should handle typical coding tasks like building REST APIs, writing functions, and understanding code structure.

When you ask it to modify existing code, it should reference what's already there and make integrated changes.

Trying Larger Models

For more complex tasks, you might want gpt-oss:20b, a larger general-purpose model:

ollama pull gpt-oss:20b

ollama pull gpt-oss:20b

Run it with:

claude --model gpt-oss:20b

claude --model gpt-oss:20b

This model handles a wider range of tasks beyond just coding — documentation generation, code review, and architectural planning.

The trade-off is resource usage. The 20b parameter model uses more RAM and processes tokens more slowly than smaller models.

Context Window Considerations

Local models have configurable context windows, and this is important when you are using Claude Code.

The tool needs enough context to understand your project structure, file contents, and conversation history.

If you're working with a larger codebase that has multiple files, the model might lose track of earlier context, leading to suggestions that don't align with the project structure.

You can adjust context length in Ollama's configuration through the Modelfile or when running the model.

For Claude Code, aim for at least 32k tokens, though 64k is better if your hardware can handle it.

Adjusting Context Length

You can modify the context window when running a model. Here's how:

Check current context length:

ollama show qwen2.5-coder:7b

ollama show qwen2.5-coder:7b

This displays the model's configuration, including its default context window size.

Run with custom context length:

You can set the context window directly when starting the model by creating a custom Modelfile.

First, create a file called Modelfile:

FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768

FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768

Then create a new model with this configuration:

ollama create qwen-32k -f Modelfile

ollama create qwen-32k -f Modelfile

Now run Claude Code with the larger context:

claude --model qwen-32k

claude --model qwen-32k

Common context sizes:

2048 tokens — Default for many small models
8192 tokens — Good for simple tasks
32768 tokens (32k) — Recommended minimum for Claude Code
65536 tokens (64k) — Better for complex projects

The larger the context, the more RAM your model will use. Balance your context needs with available system resources.

Performance

Local models are impressive, but they require adequate hardware to run :

16GB RAM handles smaller models (7b parameters) fine
Larger models (20b+) benefit from 32GB or more
Apple Silicon performs exceptionally well
Older hardware might struggle with bigger models

The speed depends on your specific setup. Smaller models like qwen2.5-coder:7b should be fast enough for daily coding work.

Larger models will be slower but still usable for tasks that don't require instant responses.

Local models work best for:

Private codebases that can't send data externally
Rapid iteration without worrying about API costs
Learning and experimentation
Projects with specific compliance requirements

The main limitation is inference speed.

**If you need the fastest possible responses or are working with extremely large contexts, **Ollama cloud models are worth considering.

Now let's move on to Ollama Cloud.

Claude Code Ollama Using Cloud Models

Local models are great, but sometimes you need more power without the hardware overhead. Ollama's cloud models bridge this gap.

You need to begin by signing up for Ollama Cloud. Let's upgrade and test the process.

Head over to Ollama Cloud — https://ollama.com/cloud

We go with the $20 plan so that we can see what's possible:

The setup for cloud models is slightly different, but the workflow remains consistent with what you've already learned.

In your account, create the API key that will be used along with the other config settings to connect Claude Code to Ollama Cloud :

Connecting to Ollama Cloud

To use cloud models, you need an API key from ollama.com. Once you have that, update your environment variables:

export ANTHROPIC_BASE_URL=https://ollama.com
export ANTHROPIC_API_KEY=your-api-key-here

export ANTHROPIC_BASE_URL=https://ollama.com
export ANTHROPIC_API_KEY=your-api-key-here

For Powershell

$env:ANTHROPIC_BASE_URL="https://ollama.com"
$env:ANTHROPIC_API_KEY="your-actual-api-key-here"

$env:ANTHROPIC_BASE_URL="https://ollama.com"
$env:ANTHROPIC_API_KEY="your-actual-api-key-here"

Replace your-actual-api-key-here with the API key you got from ollama.com.

Then verify they're set:

echo $env:ANTHROPIC_BASE_URL
echo $env:ANTHROPIC_API_KEY

echo $env:ANTHROPIC_BASE_URL
echo $env:ANTHROPIC_API_KEY

Notice the base URL changes from localhost to ollama.com. The authentication token is replaced with your actual API key.

Now you can run Claude Code with cloud models:

claude --model glm-4.7:cloud

claude --model glm-4.7:cloud

The :cloud suffix indicates you're using a cloud-hosted model. These models are immediately available without pulling, and they run at their full context length.

When you launch Claude Code, it will detect that you are using a custom key:

You also need to connect your Ollama Account:

You can now test locally by signing in to your account

Use this to launch the GPT OSS model

ollama run gpt-oss:120b-cloud

ollama run gpt-oss:120b-cloud

This command will show you all the available models :

(Invoke-WebRequest -Uri "https://ollama.com/api/tags"
-Headers @{"Authorization"="Bearer
your-api-key-here "}).Content | ConvertFrom-Json | Select-Object -ExpandProperty models | Select-Object name

(Invoke-WebRequest -Uri "https://ollama.com/api/tags"
-Headers @{"Authorization"="Bearer
your-api-key-here "}).Content | ConvertFrom-Json | Select-Object -ExpandProperty models | Select-Object name

Tool Calling Example

One feature that works well with Ollama is tool calling.

This lets Claude Code interact with external systems while solving problems.

Here's a simple example of how tool calling looks in code:

import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',
)

message = client.messages.create(
    model='qwen3-coder',
    max_tokens=1024,
    tools=[
        {
            'name': 'get_weather',
            'description': 'Get the current weather in a location',
            'input_schema': {
                'type': 'object',
                'properties': {
                    'location': {
                        'type': 'string',
                        'description': 'The city and state, e.g. San Francisco, CA'
                    }
                },
                'required': ['location']
            }
        }
    ],
    messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)

for block in message.content:
    if block.type == 'tool_use':
        print(f'Tool: {block.name}')
        print(f'Input: {block.input}')

import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',
)

message = client.messages.create(
    model='qwen3-coder',
    max_tokens=1024,
    tools=[
        {
            'name': 'get_weather',
            'description': 'Get the current weather in a location',
            'input_schema': {
                'type': 'object',
                'properties': {
                    'location': {
                        'type': 'string',
                        'description': 'The city and state, e.g. San Francisco, CA'
                    }
                },
                'required': ['location']
            }
        }
    ],
    messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)

for block in message.content:
    if block.type == 'tool_use':
        print(f'Tool: {block.name}')
        print(f'Input: {block.input}')

Final Thoughts

The Claude Code and Ollama integration gives developers more flexibility.

I'll keep testing this workflow and see how it performs at full scale. Ollama's team ships updates regularly, and the model list keeps growing.

The integration opens up workflows for teams with specific requirements around data privacy or cost management.

This also makes sense for users without hardware limitations.

I'd like to know your experience with this Claude Code Ollama workflow? Share your thoughts in the comments below.

Update: New Way to Launch Claude Code

There is a new way to launch Claude Code on Ollama, I have covered it in details in this new post:

I Tested (New) Ollama Launch For Claude Code, Codex, OpenCode (No More Configs) Forget configuration headaches, Ollama launch is the new easy way to launch Claude Code, Codex, OpenCode, Moltbot, or…

Claude Code Masterclass Course

Every day, I'm working hard to build the ultimate Claude Code course, which demonstrates how to create workflows that coordinate multiple agents for complex development tasks. It's due for release soon.

It will take what you have learned from this article to the next level of complete automation.

New features are added to Claude Code daily, and keeping up is tough.

The course explores Agents, Hooks, advanced workflows, and productivity techniques that many developers may not be aware of.

Once you join, you'll receive all the updates as new features are rolled out.

This course will cover:

Advanced subagent patterns and workflows
Production-ready hook configurations
MCP server integrations for external tools
Team collaboration strategies
Enterprise deployment patterns
Real-world case studies from my consulting work

If you're interested in getting notified when the Claude Code course launches, click here to join the early access list →

( Currently, I have 3000+ already signed-up developers)

I'll share exclusive previews, early access pricing, and bonus materials with people on the list.

Let's Connect!

If you are new to my content, my name is Joe Njenga

Join thousands of other software engineers, AI engineers, and solopreneurs who read my content daily on Medium and on YouTube where I review the latest AI engineering tools and trends. If you are more curious about my projects and want to receive detailed guides and tutorials, join thousands of other AI enthusiasts in my weekly AI Software engineer newsletter

If you would like to connect directly, you can reach out here:

AI Integration Software Engineer (10+ Years Experience ) Software Engineer specializing in AI integration and automation. Expert in building AI agents, MCP servers, RAG…

Follow me on Medium | YouTube Channel | X | LinkedIn

Contents