What's New in AI
Major releases and launches. Explained simply, without the hype.

Llama 4 Scout & Maverick. Meta's multimodal MoE models
Meta released two new open-weight models built on a Mixture-of-Experts architecture with native multimodal support — text and images in a single model. Scout has a 10 million token context window (the largest of any open model) and runs on a single server GPU. Maverick handles 1 million tokens and benchmarks alongside GPT-4o and Gemini Flash. Both are free to download.
Why it matters
The 10M context window in Scout is a genuine breakthrough for processing long documents, entire codebases, or extended conversations. Free and open-weight makes it accessible to anyone.

Gemma 4. Google's most capable open model yet
Google released Gemma 4 as a family of four open-weight models (2B, 4B, 26B MoE, 31B Dense) under an Apache 2.0 licence — free for personal and commercial use. The 31B model ranks near the top of the open-model leaderboard, outperforming models many times its size on maths, science, and coding benchmarks. The smaller models are designed to run on phones and low-cost hardware.
Why it matters
Gemma 4 is the most powerful free, runnable model at its size. The 31B punches well above its weight, and the tiny models bring capable AI to devices without internet access.

Claude Opus 4.6 & Sonnet 4.6
Anthropic's latest generation raises the bar on reasoning, coding, and long-document work. Opus 4.6 sits at the top of most benchmarks for complex tasks. Sonnet 4.6 is the everyday workhorse. Faster and more affordable while remaining highly capable.
Why it matters
The best all-round models for most professional workflows right now.

Claude Opus 4.5. Built for long autonomous work
Anthropic released Opus 4.5 with a focus on sustained, autonomous task completion — it can work independently for extended sessions across coding, research, and document analysis without losing context. It added native integrations with Chrome and Excel, letting it operate directly inside a browser or spreadsheet. Benchmark scores placed it at the top of the coding leaderboard with 96.1% on SWE-bench.
Why it matters
Knowledge workers could now hand Claude a complex spreadsheet or live web session and have it work through the whole thing — not just answer questions about it.

Claude Sonnet 4.5. Agentic mid-tier model
Sonnet 4.5 was Anthropic's upgrade targeted at long-horizon agentic tasks — coding, workplace automation, and multi-step computer use. It can maintain reliable operation across 30+ step task chains and self-correct far more consistently than previous versions. Anthropic positioned it as the go-to model for developers building autonomous assistants.
Why it matters
Users building AI tools for real business workflows got a model that could finish multi-step jobs without getting stuck and asking for help at every turn.

Sora 2. AI video with audio and physics
OpenAI released Sora 2 with a standalone iOS app, adding synchronised audio — dialogue, sound effects, and ambient noise — for the first time. Physics consistency and motion quality improved dramatically over the original. A new 'Characters' feature allows a scanned real person to appear in any generated scene with accurate appearance and voice.
Why it matters
Sora 2 made short AI films with sound and realistic motion accessible from a phone, moving AI video from a novelty into a practical creative tool.

GPT-5. OpenAI's unified reasoning model
GPT-5 was OpenAI's first model trained natively on multiple modalities from the ground up, rather than bolting capabilities together. It integrated deep step-by-step reasoning directly into the main ChatGPT interface, automatically routing easy questions for speed and hard ones for deeper thinking. OpenAI made GPT-5 free for all ChatGPT users, with higher usage on paid plans.
Why it matters
For the first time, free ChatGPT users got genuine multi-step reasoning without having to switch between 'standard' and 'thinking' modes — it just works.

Claude Opus 4 & Sonnet 4. Top coding benchmark
Anthropic released two new flagship models simultaneously. Claude Opus 4 debuted as the top-ranked coding model on SWE-bench at 72.5%, capable of sustained multi-hour autonomous coding sessions. Both models introduced extended thinking with tool use — Claude can now reason step-by-step while simultaneously browsing the web or running code.
Why it matters
Claude became capable of tackling full software projects with far fewer errors — behaving more like a tireless expert collaborator than a Q&A tool.

Veo 3. AI video with native synchronised audio
Announced at Google I/O 2025, Veo 3 became the first AI video model to generate synchronised audio alongside video natively — including character dialogue, sound effects, and ambient noise from a single text prompt. No separate audio step required. DeepMind described it as the end of the 'silent film era' for AI-generated video.
Why it matters
For creators, the jump from silent AI video to fully voiced, sound-designed clips from one prompt removes the biggest friction in AI video production.

Gemini 2.5 Flash. Fast reasoning at low cost
Also unveiled at Google I/O, Gemini 2.5 Flash is a compact, fast 'thinking' model that reasons step-by-step before responding — the same approach as Pro, at a fraction of the cost. It targets applications where speed and affordability matter more than maximum capability: coding assistants, chatbots, and high-volume API use.
Why it matters
Developers building AI-powered products got a capable reasoning model they could afford to run at scale, not just in demos.

OpenAI Codex. Autonomous software engineering agent
OpenAI launched a new version of Codex as a fully autonomous cloud-based coding agent — distinct from the older code-completion product of the same name. Powered by a version of o3 optimised for software tasks, it runs in an isolated sandbox, writes full features, fixes bugs, runs tests, and opens pull requests without a developer watching. Available initially to paid ChatGPT subscribers.
Why it matters
Non-developers could describe a change in plain English and have an AI handle the full technical implementation end-to-end — not just suggest snippets.

Llama 4. Meta's multimodal open-source model
Meta released Llama 4 with native multimodal capabilities. It handles text, images, and documents. Competitive with GPT-4o on benchmarks. Free to download and run locally, making frontier-level AI accessible without API costs.
Why it matters
The most capable open-source model to date. Runnable on decent hardware via Ollama.

GPT-4.1. OpenAI's coding-focused release
OpenAI released GPT-4.1 with a focus on code generation, instruction-following, and long-context tasks. Outperforms GPT-4o on coding benchmarks and handles 1 million tokens of context.
Why it matters
Significant upgrade for developers using ChatGPT or the OpenAI API for code.

Gemini 2.5 Pro. Google's reasoning model
Gemini 2.5 Pro topped multiple benchmarks on release with a 1-million-token context window and native reasoning mode. Particularly strong at maths, science, and multi-step analysis.
Why it matters
Competitive with o3 and Claude 3.7 on hard reasoning tasks. Available in Gemini Advanced.

Grok 3. XAI's most powerful model yet
Elon Musk's xAI released Grok 3, trained on a 200,000 GPU cluster. Strong at reasoning and STEM. Includes a 'Think' mode for step-by-step reasoning. Integrated into X Premium+.
Why it matters
First Grok model to seriously compete with GPT-4o and Claude on general tasks.

GPT-4.5. OpenAI's emotionally intelligent model
OpenAI described GPT-4.5 as their most 'human' model. Better at nuanced conversation, emotional intelligence, and understanding context beyond the literal words. Less focused on raw reasoning than o1.
Why it matters
Best ChatGPT model for writing, coaching, and conversational tasks.

Claude 3.7 Sonnet. Extended Thinking
Claude 3.7 introduced 'Extended Thinking'. A reasoning mode where Claude works through problems step by step before responding, similar to OpenAI's o1. Dramatically improves performance on maths, coding, and complex analysis.
Why it matters
First Claude model with visible chain-of-thought reasoning. Major upgrade for hard problems.

DeepSeek R1. Open-source reasoning model
Chinese lab DeepSeek released R1, an open-source reasoning model that matched OpenAI's o1 on benchmarks. At a fraction of the training cost. The weights are free to download. Shocked the AI industry and briefly crashed Nvidia's stock price.
Why it matters
Proved frontier reasoning is achievable open-source. Free to run via Ollama.

DeepSeek V3. Challenges GPT-4o
DeepSeek released V3, a 671-billion parameter Mixture-of-Experts model trained for roughly $6 million. Compared to hundreds of millions for comparable Western models. Matched GPT-4o on most benchmarks.
Why it matters
The moment the AI world realised frontier models don't require frontier budgets.

Sora. OpenAI's video model goes public
After a year of limited previews, OpenAI launched Sora publicly for ChatGPT Plus and Pro subscribers. Generates up to 20-second cinematic video clips from text or image prompts. Includes a Storyboard mode for longer narratives.
Why it matters
First time frontier-quality AI video became available to mainstream users.

o3 & o3-mini. OpenAI's best reasoning models
OpenAI's o3 set new records on the ARC-AGI benchmark. A test designed to be difficult for AI. o3-mini offers a faster, cheaper version of the same reasoning approach. Both use extended internal reasoning before responding.
Why it matters
The most capable reasoning models available, especially for maths and science.

Gemini 2.0 Flash. Fast and multimodal
Gemini 2.0 Flash launched as Google's fast, affordable frontier model with native multimodal output. It can generate text, images, and audio. Significant performance improvements over 1.5 Flash.
Why it matters
Best option for Gemini-powered apps that need speed and multimodal capability.

Claude 3.5 Haiku. Fast and affordable
Anthropic released Claude 3.5 Haiku. Their fastest, most affordable model. Outperforms the previous Claude 3 Opus on most tasks at a fraction of the cost. Designed for high-volume applications where speed matters.
Why it matters
Best Claude model for tasks that need rapid iteration. Coding autocomplete, chat, search.

o1 Pro. OpenAI's most powerful reasoning model
OpenAI expanded the o1 family with o1 Pro. Using more compute at inference to tackle harder problems. Available to ChatGPT Pro subscribers. Particularly strong at frontier-level maths, science, and complex coding challenges.
Why it matters
The first model to pass professional-level qualifying exams across multiple domains.
Showing major releases only. Subscribe for weekly updates.