SimpleAIsimpleai

Ollama

Free tierUpdated 2026-04

Run powerful AI models on your own computer — free, private, offline.

🟡Intermediate15 minutes to set upTry Ollama

What is Ollama?

Ollama is a free, open-source tool that makes running AI language models on your own computer as simple as typing a single command. Install it, run ollama run llama3.2, and within minutes you're chatting with a capable AI model — with no API key, no monthly bill, and no data leaving your machine.

Ollama handles everything: downloading the model, managing multiple versions, serving a local API, and providing a simple chat interface in the terminal. For developers, it also exposes an OpenAI-compatible API at localhost:11434 so any tool built for OpenAI can be pointed at your local models instead.

Key models available

Browse the full library at ollama.com/library. Popular starting points:

ModelBest forSizeRAM needed
llama3.2General chat, writing2GB8GB
mistralBalanced, fast4GB8GB
deepseek-r1:7bReasoning, math, coding4GB8GB
gemma3Google's efficient model5GB8GB
phi3Small, very efficient2.2GB4GB
codellamaCode completion and explanation4GB8GB
llavaMultimodal (understands images)4.5GB8GB

The 7B (7-billion parameter) size of most models runs fine on 8GB RAM. Go up to 13B for better quality if you have 16GB.

The magic moment

Open your terminal, type:

ollama run llama3.2

Watch it download (a few minutes), then type "Hello" at the prompt. You're talking to a capable AI on your own hardware, completely offline. No account, no key, no charge. For someone who's only ever used ChatGPT, that moment of "this is running on my laptop" is genuinely surprising.

Step-by-step setup

  1. Go to ollama.com and download the installer for your OS
  2. Install and launch — Ollama runs as a background service
  3. Open Terminal (Mac/Linux) or Command Prompt (Windows)
  4. Pull and run your first model:
    ollama run llama3.2
    
  5. Chat in the terminal, or press Ctrl+D to exit to the command prompt
  6. To pull a model without running it: ollama pull mistral
  7. List your downloaded models: ollama list
  8. For a proper chat UI, install Open WebUI — a free ChatGPT-style browser interface that points at your local Ollama instance

Total setup: about 15 minutes.

Using Ollama with Open WebUI

The terminal interface works but isn't very comfortable for long conversations. Open WebUI is the most popular solution — it's a free, self-hosted chat interface that connects to Ollama and looks like ChatGPT:

# Requires Docker
docker run -d -p 3000:80 --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000. You can switch between models, save conversations, and use image uploads if you have a multimodal model like llava.

Using Ollama via API

Ollama exposes an OpenAI-compatible API at http://localhost:11434. This means any code using the OpenAI client library can be redirected to your local models by changing the base URL:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # required but ignored
)

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Explain recursion simply"}]
)
print(response.choices[0].message.content)

This is useful for building apps or testing locally before switching to a paid API.

Ollama vs LM Studio

OllamaLM Studio
InterfaceTerminal + APIDesktop GUI
Ease of useTerminal comfort neededBeginner-friendly
Scripting / APIExcellentGood (also has API server)
Model managementCLI commandsVisual browser
Best forDevelopers, automationNon-technical users

Both tools run the same underlying models. Choose Ollama if you're comfortable in a terminal and want an API. Choose LM Studio if you want to click through everything without typing commands.