The last six months in LLMs in five minutes

TL;DR

Over the past six months, large language models have seen rapid progress, especially in coding abilities and model performance shifts. Notable events include the rise and fall of top models, improvements in AI coding agents, and the emergence of new projects like OpenClaw.

Over the past six months, the landscape of large language models (LLMs) has undergone significant shifts, with multiple models overtaking each other as the leading AI in performance and coding capabilities. This period, marked by rapid innovation and model competition, is crucial for understanding current AI capabilities and future trends.

In November 2025, the ‘best’ model was widely regarded as Claude Sonnet 4.5, released in September. However, it was quickly overtaken by GPT-5.1, Gemini 3, GPT-5.1 Codex Max, and then Claude Opus 4.5, with Gemini 3 often producing the most impressive outputs, such as detailed pelican drawings. During this time, improvements in reinforcement learning techniques led to coding agents that transitioned from basic tools to reliable, daily-use assistants, significantly reducing errors and increasing practical utility.

Simultaneously, a new project called Warelay, later renamed OpenClaw, emerged in late November, gaining rapid attention by February as a ‘personal AI assistant’ built on the Claw framework. This project, less than three months old, attracted widespread interest and even commercial hardware interest, as users bought Mac Minis to run their Claws. In parallel, model updates like Gemini 3.1 Pro and Google’s Gemma 4 series showcased notable improvements in AI-generated imagery and code, including highly detailed and animated pelican images. Chinese AI lab GLM released GLM-5.1, a large open-weight model capable of complex tasks, including animated pelican scenes, though with some distortions.

Why It Matters

This period marks a pivotal shift in AI development, with coding agents reaching a level of reliability that makes them viable for regular use, and new models consistently outperforming previous benchmarks. The rapid succession of model dominance reflects a highly competitive environment, pushing the boundaries of what AI can achieve in both creative and practical applications. These advances influence AI deployment strategies, developer tools, and potentially the broader AI industry landscape.

AI Secrets for Mechatronics & Robotics Engineers: A Professional Guide to Claude, Claude Code, and ChatGPT for Firmware, ROS 2, Control Systems, CAD, and PCB Design

As an affiliate, we earn on qualifying purchases.

Background

The last six months follow the ‘November 2025 inflection point,’ a critical period when the top models changed hands multiple times, signaling a fast-evolving competitive landscape. Prior to this, progress was steady but less dramatic. The focus on reinforcement learning from verifiable rewards significantly improved coding assistance, transforming AI from experimental tools into reliable productivity aids. The emergence of projects like OpenClaw exemplifies the trend toward accessible, specialized AI assistants that are rapidly gaining popularity and hardware support.

“The last six months have seen a real acceleration in model performance and new project emergence, especially in coding agents.”

— Hacker News user

“The reinforcement learning improvements have made AI coding agents reliable enough for daily work, a significant leap forward.”

— AI researcher

“Mac Minis are now the new digital pets, running AI Claws that are both fascinating and commercially popular.”

— Drew Breunig

COMPLETE MAC MINI M4 USER GUIDE FOR BEGINNERS AND SENIORS: Everything You Need to Master Yor Mac mini M4: Simple Setup, Essential Apps, Apple Intelligence, Troubleshooting, and lot more

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

While model rankings and capabilities are well-documented, the long-term stability of these performance shifts remains uncertain. The full impact of projects like OpenClaw on AI deployment and market dynamics is still developing. Additionally, the future trajectory of model improvements and whether current trends will accelerate or plateau are not yet clear.

Pocket AI Voice Recorder & Smart Assistant – Auto Transcription, Summaries & Action Items – AI Note Taker for Meetings, Calls & Productivity – Space Grey

YOUR AI PERSONAL ASSISTANT FOR EVERYDAY PRODUCTIVITY: More than a voice recorder, Pocket works as your AI personal…

As an affiliate, we earn on qualifying purchases.

What’s Next

Expect ongoing model updates and new AI projects to continue emerging, with further refinement of coding agents and possibly new benchmarks. Industry players will likely focus on scaling models further, integrating them into more practical applications, and addressing current limitations such as biases and robustness. Monitoring how these developments influence AI adoption in commercial and consumer sectors will be key.

AI Engineering: Building Applications with Foundation Models

As an affiliate, we earn on qualifying purchases.

Key Questions

What caused the rapid shifts in model performance over the past six months?

The combination of advances in reinforcement learning, increased compute power, and competitive model development drove the rapid performance changes among leading AI models.

How reliable are current AI coding agents for real-world tasks?

Recent improvements have made coding agents significantly more reliable, capable of handling daily tasks with minimal errors, though occasional mistakes still occur.

What is OpenClaw and why has it gained attention?

OpenClaw is a project developing personal AI assistants based on the Claw framework. It has gained attention due to its rapid development, accessibility, and popularity among users running it on consumer hardware like Mac Minis.

Are we likely to see more model dominance shifts soon?

Given the current competitive environment and rapid innovation pace, further shifts in model performance and rankings are expected in the near future.

The last six months in LLMs in five minutes

Up next

Probe synthetic test

Author

Deep Intellica Team

Share article

Why It Matters

AI Secrets for Mechatronics & Robotics Engineers: A Professional Guide to Claude, Claude Code, and ChatGPT for Firmware, ROS 2, Control Systems, CAD, and PCB Design

Background

COMPLETE MAC MINI M4 USER GUIDE FOR BEGINNERS AND SENIORS: Everything You Need to Master Yor Mac mini M4: Simple Setup, Essential Apps, Apple Intelligence, Troubleshooting, and lot more

What Remains Unclear

Pocket AI Voice Recorder & Smart Assistant – Auto Transcription, Summaries & Action Items – AI Note Taker for Meetings, Calls & Productivity – Space Grey

What’s Next

AI Engineering: Building Applications with Foundation Models

Key Questions

What caused the rapid shifts in model performance over the past six months?

How reliable are current AI coding agents for real-world tasks?

What is OpenClaw and why has it gained attention?

Are we likely to see more model dominance shifts soon?

The U.S. Army Turns to Algorithms to Decide Who Moves up the Ranks.

How Claude Code works in large codebases

Nuclear startup Deep Fission says it’s going public, again, and I have questions

The Compute Reckoning: Anthropic Finally Admits What Customers Suspected for Ten Months

Debunking Myths About AI: 10 Facts You Should Know

Protecting Student Privacy With FERPA-Ready Records In Schools

Transforming Clients’ Dining Rooms With AI: My Experience And Outcomes

LM Studio Bionic: The AI Agent For Open Models

The last six months in LLMs in five minutes

Up next

Author

Deep Intellica Team

Share article

Why It Matters

AI Secrets for Mechatronics & Robotics Engineers: A Professional Guide to Claude, Claude Code, and ChatGPT for Firmware, ROS 2, Control Systems, CAD, and PCB Design

Background

COMPLETE MAC MINI M4 USER GUIDE FOR BEGINNERS AND SENIORS: Everything You Need to Master Yor Mac mini M4: Simple Setup, Essential Apps, Apple Intelligence, Troubleshooting, and lot more

What Remains Unclear

Pocket AI Voice Recorder & Smart Assistant – Auto Transcription, Summaries & Action Items – AI Note Taker for Meetings, Calls & Productivity – Space Grey

What’s Next

AI Engineering: Building Applications with Foundation Models

Key Questions

What caused the rapid shifts in model performance over the past six months?

How reliable are current AI coding agents for real-world tasks?

What is OpenClaw and why has it gained attention?

Are we likely to see more model dominance shifts soon?

You May Also Like