What is Ollama?
Ollama is a free, open-source tool that makes running AI language models on your own computer as simple as typing a single command. Install it, run ollama run llama3.2, and within minutes you're chatting with a capable AI model — with no API key, no monthly bill, and no data leaving your machine.
Ollama handles everything: downloading the model, managing multiple versions, serving a local API, and providing a simple chat interface in the terminal. For developers, it also exposes an OpenAI-compatible API at localhost:11434 so any tool built for OpenAI can be pointed at your local models instead.
Key models available
Browse the full library at ollama.com/library. Popular starting points:
| Model | Best for | Size | RAM needed |
|---|---|---|---|
llama3.2 | General chat, writing | 2GB | 8GB |
mistral | Balanced, fast | 4GB | 8GB |
deepseek-r1:7b | Reasoning, math, coding | 4GB | 8GB |
gemma3 | Google's efficient model | 5GB | 8GB |
phi3 | Small, very efficient | 2.2GB | 4GB |
codellama | Code completion and explanation | 4GB | 8GB |
llava | Multimodal (understands images) | 4.5GB | 8GB |
The 7B (7-billion parameter) size of most models runs fine on 8GB RAM. Go up to 13B for better quality if you have 16GB.
The magic moment
Open your terminal, type:
ollama run llama3.2
Watch it download (a few minutes), then type "Hello" at the prompt. You're talking to a capable AI on your own hardware, completely offline. No account, no key, no charge. For someone who's only ever used ChatGPT, that moment of "this is running on my laptop" is genuinely surprising.
Step-by-step setup
- Go to ollama.com and download the installer for your OS
- Install and launch — Ollama runs as a background service
- Open Terminal (Mac/Linux) or Command Prompt (Windows)
- Pull and run your first model:
ollama run llama3.2 - Chat in the terminal, or press
Ctrl+Dto exit to the command prompt - To pull a model without running it:
ollama pull mistral - List your downloaded models:
ollama list - For a proper chat UI, install Open WebUI — a free ChatGPT-style browser interface that points at your local Ollama instance
Total setup: about 15 minutes.
Using Ollama with Open WebUI
The terminal interface works but isn't very comfortable for long conversations. Open WebUI is the most popular solution — it's a free, self-hosted chat interface that connects to Ollama and looks like ChatGPT:
# Requires Docker
docker run -d -p 3000:80 --add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000. You can switch between models, save conversations, and use image uploads if you have a multimodal model like llava.
Using Ollama via API
Ollama exposes an OpenAI-compatible API at http://localhost:11434. This means any code using the OpenAI client library can be redirected to your local models by changing the base URL:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # required but ignored
)
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Explain recursion simply"}]
)
print(response.choices[0].message.content)
This is useful for building apps or testing locally before switching to a paid API.
Ollama vs LM Studio
| Ollama | LM Studio | |
|---|---|---|
| Interface | Terminal + API | Desktop GUI |
| Ease of use | Terminal comfort needed | Beginner-friendly |
| Scripting / API | Excellent | Good (also has API server) |
| Model management | CLI commands | Visual browser |
| Best for | Developers, automation | Non-technical users |
Both tools run the same underlying models. Choose Ollama if you're comfortable in a terminal and want an API. Choose LM Studio if you want to click through everything without typing commands.