Ollama

What is Ollama?

Ollama is a free, open-source tool that makes running AI language models on your own computer as simple as typing a single command. Install it, run ollama run llama3.2, and within minutes you're chatting with a capable AI model, with no API key, no monthly bill, and no data leaving your machine.

Ollama handles everything: downloading the model, managing multiple versions, serving a local API, and providing a simple chat interface in the terminal. For developers, it also exposes an OpenAI-compatible API at localhost:11434 so any tool built for OpenAI can be pointed at your local models instead.

Key models available

Browse the full library at ollama.com/library. Popular starting points:

Model	Best for	Size	RAM needed
`llama3.2`	General chat, writing	2GB	8GB
`mistral`	Balanced, fast	4GB	8GB
`deepseek-r1:7b`	Reasoning, math, coding	4GB	8GB
`gemma3`	Google's efficient model	5GB	8GB
`phi3`	Small, very efficient	2.2GB	4GB
`codellama`	Code completion and explanation	4GB	8GB
`llava`	Multimodal (understands images)	4.5GB	8GB

The 7B (7-billion parameter) size of most models runs fine on 8GB RAM. Go up to 13B for better quality if you have 16GB.

The magic moment

Open your terminal, type:

ollama run llama3.2

Watch it download (a few minutes), then type "Hello" at the prompt. You're talking to a capable AI on your own hardware, completely offline. No account, no key, no charge. For someone who's only ever used ChatGPT, that moment of "this is running on my laptop" is genuinely surprising.

Step-by-step setup

Go to ollama.com and download the installer for your OS
Install and launch. Ollama runs as a background service
Open Terminal (Mac/Linux) or Command Prompt (Windows)
Pull and run your first model:
```
ollama run llama3.2
```
Chat in the terminal, or press Ctrl+D to exit to the command prompt
To pull a model without running it: ollama pull mistral
List your downloaded models: ollama list
For a proper chat UI, install Open WebUI, a free ChatGPT-style browser interface that points at your local Ollama instance

Total setup: about 15 minutes.

Using Ollama with Open WebUI

The terminal interface works but isn't very comfortable for long conversations. Open WebUI is the most popular solution. It's a free, self-hosted chat interface that connects to Ollama and looks like ChatGPT:

# Requires Docker
docker run -d -p 3000:80 --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000. You can switch between models, save conversations, and use image uploads if you have a multimodal model like llava.

Using Ollama via API

Ollama exposes an OpenAI-compatible API at http://localhost:11434. This means any code using the OpenAI client library can be redirected to your local models by changing the base URL:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # required but ignored
)

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Explain recursion simply"}]
)
print(response.choices[0].message.content)

This is useful for building apps or testing locally before switching to a paid API.

Ollama vs LM Studio

	Ollama	LM Studio
Interface	Terminal + API	Desktop GUI
Ease of use	Terminal comfort needed	Beginner-friendly
Scripting / API	Excellent	Good (also has API server)
Model management	CLI commands	Visual browser
Best for	Developers, automation	Non-technical users

Both tools run the same underlying models. Choose Ollama if you're comfortable in a terminal and want an API. Choose LM Studio if you want to click through everything without typing commands.