January 19, 2026
I Tried New Claude Code Ollama Workflow ( It’s Wild & Free)
Claude Code now works with Ollama, which takes the game to the next level for developers who want to work locally or need flexible model…

By Joe Njenga
12 min read
Claude Code now works with Ollama, which takes the game to the next level for developers who want to work locally or need flexible model options.
I've been testing this integration since the announcement, and the possibilities are impressive. I made all the mistakes that will cost you time and documented them here to help you know what works and save time!
If you are not a premium Medium member, read the full tutorial FREE here and consider joining medium to read more such guides.
Just in case you are new to Ollama, it's a tool that lets you run large language models on your local machine.
It's like a router between AI models and your development environment, without sending every request to the cloud.
This is an ideal workflow for privacy-conscious projects, air-gapped systems, or when you want to avoid API costs, which is a concern for many developers.
Ollama v0.14.0 and later versions are compatible with the Anthropic Messages API. This means Claude Code can now works with any Ollama model.
You can run Claude Code with local open-source models on your machine, or connect to cloud models through ollama.com.
I wanted to test this to see how it performs.
The ideal workflow is using Claude Code's agentic capabilities with the model that fits your needs, whether that's a local model or a cloud model for complex tasks.
In this post,
I will walk you through how I set this up and what I discovered while testing it the first time.
Setting Up Claude Code Ollama Integration
Getting Claude Code to work with Ollama requires three things:
- Claude Code installed
- Ollama running
- Right environment variables configured.
I'll walk you through each step because the setup is simple once you understand what each piece does.
1) Installing Claude Code
If you don't have Claude Code yet, the installation is easy.
For macOS, Linux, or WSL:
curl -fsSL https://claude.ai/install.sh | bashcurl -fsSL https://claude.ai/install.sh | bashFor Windows PowerShell:
irm https://claude.ai/install.ps1 | iexirm https://claude.ai/install.ps1 | iex
For Windows CMD:
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmdcurl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmdThe installer handles everything, and once it completes, restart your terminal to ensure the claude command is available in your path.
When installation is complete, verify using this command claude --version
2) Installing Ollama
Before you can use Claude Code with Ollama, you need to have Ollama installed on your system.
For macOS and Linux:
Download and install from the official site:
curl -fsSL https://ollama.com/install.sh | shcurl -fsSL https://ollama.com/install.sh | shOr visit ollama.com/download to get the installer for your platform.
For Windows:
Download the Windows installer from ollama.com/download and run it.
The installer will set up Ollama and add it to your system path.
After installation, verify Ollama is working:
ollama --versionollama --version
Ollama runs as a background service automatically after installation. You should see it running on
[http://localhost:11434](http://localhost:11434.).
To check if the service is running:
That confirms Ollama is installed and running.
The empty list means you haven't pulled any models yet, which is expected.
Now you can proceed with pulling your first model.
Try this first model pull:
ollama pull qwen3-coderollama pull qwen3-coderThis will download the qwen3-coder model. It'll take a few minutes, depending on your internet speed, since it's downloading several gigabytes.
That's a huge one; it requires about 18GB of space. If you want to try out with another smaller model, you can go with these options:
Qwen2.5-coder:7b (around 4.7GB) — Good balance for coding:
ollama pull qwen2.5-coder:7bollama pull qwen2.5-coder:7bStarcoder2:3b (around 1.7GB) — Compact coding model:
ollama pull starcoder2:3bollama pull starcoder2:3bQwen2.5-coder:1.5b (around 1GB) — Lightweight option:
ollama pull qwen2.5-coder:1.5bollama pull qwen2.5-coder:1.5bDeepseek-coder:1.3b (around 776MB) — Smallest coding model:
ollama pull deepseek-coder:1.3bollama pull deepseek-coder:1.3bFor this setup, let's go with the smallest coding model :
Once that completes, run
ollama listAgain, you should see Qwen3-coder or any other model that you have pulled into the list.
You can also launch Ollama as a GUI app, where you can access the model you just pulled and can download other models as well.
Now that we have the model pulled, let's configure the variables and test the Claude Code Ollama workflow.
3) Configuring Environment Variables
In this step, we configure Claude Code to talk to Ollama instead of the default Anthropic API.
You need two environment variables:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434For PowerShell, use this:
$env:ANTHROPIC_AUTH_TOKEN="ollama"
$env:ANTHROPIC_BASE_URL="http://localhost:11434"$env:ANTHROPIC_AUTH_TOKEN="ollama"
$env:ANTHROPIC_BASE_URL="http://localhost:11434"
To verify they're set:
echo $env:ANTHROPIC_AUTH_TOKEN
echo $env:ANTHROPIC_BASE_URLecho $env:ANTHROPIC_AUTH_TOKEN
echo $env:ANTHROPIC_BASE_URL
Note: These are temporary and only last for your current PowerShell session. If you want them to be permanent, you can add them to your PowerShell profile or set them as system environment variables through Windows Settings.
- The
ANTHROPIC_AUTH_TOKENis set to "ollama" because the API key is required by the SDK, but Ollama doesn't validate it. - The
ANTHROPIC_BASE_URLpoints to your local Ollama instance running on port 11434.
You can set these temporarily in your current terminal session, or add them to your shell profile for permanent configuration.
I prefer keeping them in my
.zshrcor.bashrcso they're always available.
Remember :
Before running Claude Code, you need at least one model pulled locally.
4) Running Your First Command
Once the model is pulled, you can launch Claude Code with Ollama:
claude --model deepseek-coder:1.3bclaude --model deepseek-coder:1.3b
Claude Code starts up and connects to your local Ollama instance.
You'll see the familiar Claude Code interface, but now it's running through your local model instead of Anthropic's API.
I tested this by asking it to write a simple Python function.
API Error: 400
{"type":"error","error":{"type":"invalid_request_error","message":"registry.ollama.ai/library/deepseek-coder:1.3b
does not support tools"},"request_id":"req_bf347cdc82bc7ea0dddd4f0d"} API Error: 400
{"type":"error","error":{"type":"invalid_request_error","message":"registry.ollama.ai/library/deepseek-coder:1.3b
does not support tools"},"request_id":"req_bf347cdc82bc7ea0dddd4f0d"}I then hit this error and remembered an important detail :deepseek-coder:1.3b model doesn't support tool calling, which Claude Code requires to function.
For this to work, we need to use a larger model that supports tools.
Let's go with qwen2.5-coder:7b instead:
ollama pull qwen2.5-coder:7bollama pull qwen2.5-coder:7b
That's about 5 GB in size, and once that's downloaded, try:
ollama listollama listYou should see your newly pulled model added to the list
Now it's time to launch Claude Code using this new model to see if it works, run using this command:
claude --model qwen2.5-coder:7bclaude --model qwen2.5-coder:7b
The 7b models generally support tool calling, which is essential for Claude Code to read, modify, and execute code in your working directory.
I ran the previously failed prompt using this new model :
It's at this point that I discovered my hardware limitations; the 7b model was struggling on my hardware — 5+ minutes is way too long.
It was time to try the Ollama Cloud option since I had hit the wall here!
But first, here are some things to note about working with a local setup
Working with Local Models
Keep in mind that smaller models like this 1.3b version won't match the quality of larger specialized coding models, but they're perfect for testing the integration and handling straightforward coding tasks.
Understanding Context Length
Claude Code works best with models that have at least 32k tokens of context.
Smaller models often have limited context windows, which can cause issues with larger codebases.
You can check and adjust context length in Ollama's configuration.
The cloud models from ollama.com always run at their full context length, so this is mainly a consideration for local models.
With smaller models, the context limitation is more noticeable. For complex projects with many files, you might want to consider using a larger model or switching to cloud models for better performance.
Running Claude Code with local models gives you complete control over your development environment. No API costs, rate limits, and your code never leaves your machine.
Here's how different local models work with Claude Code and what to expect.
Understanding Model Options
qwen2.5-coder:7b is trained for coding tasks. You can pull it with:
ollama pull qwen2.5-coder:7bollama pull qwen2.5-coder:7bThen run Claude Code:
claude --model qwen2.5-coder:7bclaude --model qwen2.5-coder:7bThis model should handle typical coding tasks like building REST APIs, writing functions, and understanding code structure.
When you ask it to modify existing code, it should reference what's already there and make integrated changes.
Trying Larger Models
For more complex tasks, you might want gpt-oss:20b, a larger general-purpose model:
ollama pull gpt-oss:20bollama pull gpt-oss:20bRun it with:
claude --model gpt-oss:20bclaude --model gpt-oss:20bThis model handles a wider range of tasks beyond just coding — documentation generation, code review, and architectural planning.
The trade-off is resource usage. The 20b parameter model uses more RAM and processes tokens more slowly than smaller models.
Context Window Considerations
Local models have configurable context windows, and this is important when you are using Claude Code.
The tool needs enough context to understand your project structure, file contents, and conversation history.
If you're working with a larger codebase that has multiple files, the model might lose track of earlier context, leading to suggestions that don't align with the project structure.
You can adjust context length in Ollama's configuration through the Modelfile or when running the model.
For Claude Code, aim for at least 32k tokens, though 64k is better if your hardware can handle it.
Adjusting Context Length
You can modify the context window when running a model. Here's how:
Check current context length:
ollama show qwen2.5-coder:7bollama show qwen2.5-coder:7b
This displays the model's configuration, including its default context window size.
Run with custom context length:
You can set the context window directly when starting the model by creating a custom Modelfile.
First, create a file called
Modelfile:
FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768Then create a new model with this configuration:
ollama create qwen-32k -f Modelfileollama create qwen-32k -f ModelfileNow run Claude Code with the larger context:
claude --model qwen-32kclaude --model qwen-32kCommon context sizes:
- 2048 tokens — Default for many small models
- 8192 tokens — Good for simple tasks
- 32768 tokens (32k) — Recommended minimum for Claude Code
- 65536 tokens (64k) — Better for complex projects
The larger the context, the more RAM your model will use. Balance your context needs with available system resources.
Performance
Local models are impressive, but they require adequate hardware to run :
- 16GB RAM handles smaller models (7b parameters) fine
- Larger models (20b+) benefit from 32GB or more
- Apple Silicon performs exceptionally well
- Older hardware might struggle with bigger models
The speed depends on your specific setup. Smaller models like qwen2.5-coder:7b should be fast enough for daily coding work.
Larger models will be slower but still usable for tasks that don't require instant responses.
Local models work best for:
- Private codebases that can't send data externally
- Rapid iteration without worrying about API costs
- Learning and experimentation
- Projects with specific compliance requirements
The main limitation is inference speed.
**If you need the fastest possible responses or are working with extremely large contexts, **Ollama cloud models are worth considering.
Now let's move on to Ollama Cloud.
Claude Code Ollama Using Cloud Models
Local models are great, but sometimes you need more power without the hardware overhead. Ollama's cloud models bridge this gap.
You need to begin by signing up for Ollama Cloud. Let's upgrade and test the process.
Head over to Ollama Cloud — https://ollama.com/cloud
We go with the $20 plan so that we can see what's possible:
The setup for cloud models is slightly different, but the workflow remains consistent with what you've already learned.
In your account, create the API key that will be used along with the other config settings to connect Claude Code to Ollama Cloud :
Connecting to Ollama Cloud
To use cloud models, you need an API key from ollama.com. Once you have that, update your environment variables:
export ANTHROPIC_BASE_URL=https://ollama.com
export ANTHROPIC_API_KEY=your-api-key-hereexport ANTHROPIC_BASE_URL=https://ollama.com
export ANTHROPIC_API_KEY=your-api-key-hereFor Powershell
$env:ANTHROPIC_BASE_URL="https://ollama.com"
$env:ANTHROPIC_API_KEY="your-actual-api-key-here"$env:ANTHROPIC_BASE_URL="https://ollama.com"
$env:ANTHROPIC_API_KEY="your-actual-api-key-here"
Replace your-actual-api-key-here with the API key you got from ollama.com.
Then verify they're set:
echo $env:ANTHROPIC_BASE_URL
echo $env:ANTHROPIC_API_KEYecho $env:ANTHROPIC_BASE_URL
echo $env:ANTHROPIC_API_KEY
Notice the base URL changes from localhost to ollama.com. The authentication token is replaced with your actual API key.
Now you can run Claude Code with cloud models:
claude --model glm-4.7:cloudclaude --model glm-4.7:cloudThe
:cloudsuffix indicates you're using a cloud-hosted model. These models are immediately available without pulling, and they run at their full context length.
When you launch Claude Code, it will detect that you are using a custom key:
You also need to connect your Ollama Account:
You can now test locally by signing in to your account
Use this to launch the GPT OSS model
ollama run gpt-oss:120b-cloudollama run gpt-oss:120b-cloud
This command will show you all the available models :
(Invoke-WebRequest -Uri "https://ollama.com/api/tags"
-Headers @{"Authorization"="Bearer
your-api-key-here "}).Content | ConvertFrom-Json | Select-Object -ExpandProperty models | Select-Object name(Invoke-WebRequest -Uri "https://ollama.com/api/tags"
-Headers @{"Authorization"="Bearer
your-api-key-here "}).Content | ConvertFrom-Json | Select-Object -ExpandProperty models | Select-Object name
Tool Calling Example
One feature that works well with Ollama is tool calling.
This lets Claude Code interact with external systems while solving problems.
Here's a simple example of how tool calling looks in code:
import anthropic
client = anthropic.Anthropic(
base_url='http://localhost:11434',
api_key='ollama',
)
message = client.messages.create(
model='qwen3-coder',
max_tokens=1024,
tools=[
{
'name': 'get_weather',
'description': 'Get the current weather in a location',
'input_schema': {
'type': 'object',
'properties': {
'location': {
'type': 'string',
'description': 'The city and state, e.g. San Francisco, CA'
}
},
'required': ['location']
}
}
],
messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)
for block in message.content:
if block.type == 'tool_use':
print(f'Tool: {block.name}')
print(f'Input: {block.input}')import anthropic
client = anthropic.Anthropic(
base_url='http://localhost:11434',
api_key='ollama',
)
message = client.messages.create(
model='qwen3-coder',
max_tokens=1024,
tools=[
{
'name': 'get_weather',
'description': 'Get the current weather in a location',
'input_schema': {
'type': 'object',
'properties': {
'location': {
'type': 'string',
'description': 'The city and state, e.g. San Francisco, CA'
}
},
'required': ['location']
}
}
],
messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)
for block in message.content:
if block.type == 'tool_use':
print(f'Tool: {block.name}')
print(f'Input: {block.input}')Final Thoughts
The Claude Code and Ollama integration gives developers more flexibility.
I'll keep testing this workflow and see how it performs at full scale. Ollama's team ships updates regularly, and the model list keeps growing.
The integration opens up workflows for teams with specific requirements around data privacy or cost management.
This also makes sense for users without hardware limitations.
I'd like to know your experience with this Claude Code Ollama workflow? Share your thoughts in the comments below.
Update: New Way to Launch Claude Code
There is a new way to launch Claude Code on Ollama, I have covered it in details in this new post:
I Tested (New) Ollama Launch For Claude Code, Codex, OpenCode (No More Configs) Forget configuration headaches, Ollama launch is the new easy way to launch Claude Code, Codex, OpenCode, Moltbot, or…
Claude Code Masterclass Course
Every day, I'm working hard to build the ultimate Claude Code course, which demonstrates how to create workflows that coordinate multiple agents for complex development tasks. It's due for release soon.
It will take what you have learned from this article to the next level of complete automation.
New features are added to Claude Code daily, and keeping up is tough.
The course explores Agents, Hooks, advanced workflows, and productivity techniques that many developers may not be aware of.
Once you join, you'll receive all the updates as new features are rolled out.
This course will cover:
- Advanced subagent patterns and workflows
- Production-ready hook configurations
- MCP server integrations for external tools
- Team collaboration strategies
- Enterprise deployment patterns
- Real-world case studies from my consulting work
If you're interested in getting notified when the Claude Code course launches, click here to join the early access list →
( Currently, I have 3000+ already signed-up developers)
I'll share exclusive previews, early access pricing, and bonus materials with people on the list.
Let's Connect!
If you are new to my content, my name is Joe Njenga
Join thousands of other software engineers, AI engineers, and solopreneurs who read my content daily on Medium and on YouTube where I review the latest AI engineering tools and trends. If you are more curious about my projects and want to receive detailed guides and tutorials, join thousands of other AI enthusiasts in my weekly AI Software engineer newsletter
If you would like to connect directly, you can reach out here:
AI Integration Software Engineer (10+ Years Experience ) Software Engineer specializing in AI integration and automation. Expert in building AI agents, MCP servers, RAG…
Follow me on Medium | YouTube Channel | X | LinkedIn