📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google highlights that in AI-driven software development, the model accounts for only 10% of system behavior. The focus should be on harness design and context engineering, which constitute the majority of effective control and quality.

A new Google whitepaper reveals that the AI model itself accounts for only about 10% of the behavior in AI-driven systems, emphasizing that harness design and context engineering are the primary factors influencing system performance and reliability. This insight challenges the common focus on developing larger or more advanced models, highlighting a shift toward better system configuration and verification.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the biggest shift in software engineering is moving from writing code to expressing intent and trusting machines to interpret that intent. It reports that as of early 2026, 85% of professional developers use AI coding agents regularly, with 51% using them daily and about 41% of new code being AI-generated. The core message is that the model’s role is limited, and success depends largely on the harness — the prompts, tools, rules, and context around the model. Experiments cited in the paper show that tweaking the harness can significantly improve AI agent performance, often more than changing the model itself. The authors argue that failures are mostly due to configuration issues, such as missing tools or vague rules, rather than model limitations.

At a glance
reportWhen: published March 2026
The developmentThe new SDLC framework shifts the focus from model improvements to harness and context engineering, redefining best practices in AI software development.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development Strategies

This shift in understanding impacts how organizations should invest in AI systems. Instead of prioritizing the latest models, companies are encouraged to focus on designing effective harnesses and context management. This approach can lead to better performance, lower costs, and more secure systems, as evidenced by experiments showing that configuration improvements outperform model upgrades. Recognizing that the majority of control lies outside the model redefines the skill set needed for AI engineering and influences future development priorities.

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on AI System Design and Evolving Practices

Historically, AI development focused on improving models—making them larger, more accurate, or more capable. However, recent trends, including the rise of AI coding agents, have shifted attention toward system configuration, verification, and context management. The whitepaper builds on ongoing debates about the real drivers of AI system performance, emphasizing that the model is only a small part of the overall system. Experiments from industry leaders demonstrate that adjustments to prompts, tools, and rules can yield performance gains comparable to or exceeding those from model improvements. This perspective aligns with the broader evolution toward ‘agentic engineering,’ where the system’s architecture and operational scaffolding are prioritized over raw model power.

“The model accounts for only about 10% of what determines behavior; the harness is 90%.”

— Addy Osmani

Amazon

automated testing software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unresolved Questions About Implementation and Impact

While the whitepaper presents compelling evidence that harness design outweighs model quality, it is still unclear how this insight will influence long-term AI development strategies across different industries. Specific best practices for scaling context engineering and harness management are still evolving, and the extent to which smaller organizations can implement these approaches remains to be seen. Additionally, the impact on AI model research priorities and the pace of model innovation is not yet fully understood.

Amazon

AI model validation tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI System Optimization and Research

Organizations are likely to shift their focus toward developing robust harnesses, improving context management, and establishing verification protocols. Future research may explore standardized frameworks for harness design and best practices for context engineering. Additionally, industry leaders and developers will test these concepts at scale, refining methodologies to maximize system performance while controlling costs and vulnerabilities. Monitoring how these practices influence AI development and deployment will be critical in the coming months.

Amazon

software configuration management tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the focus shifting from models to harness and context?

The whitepaper emphasizes that system behavior is mostly determined by configuration and context, not the model itself, making harness design and context management the key to effective AI systems.

How can organizations improve AI performance according to the new framework?

By investing in harness development, context engineering, and verification processes, organizations can significantly enhance AI reliability and efficiency without always needing the latest models.

Does this mean model development is no longer important?

Model development remains valuable, but the whitepaper suggests it is only a small part of the overall system. The main gains come from system configuration and operational scaffolding.

What are the risks of focusing too much on harness design?

Overemphasizing harness and context could lead to complex, hard-to-maintain systems if not managed properly, and may divert attention from ongoing model improvements that still matter.

Will this approach reduce AI development costs?

Potentially, yes. The whitepaper argues that a disciplined, configuration-focused approach can lower long-term costs by reducing token waste, improving security, and decreasing maintenance burdens.

Source: ThorstenMeyerAI.com

You May Also Like

Wayfinder Router: deterministic routing of queries between local and hosted LLM

Wayfinder introduces offline, deterministic routing of prompts between local and cloud LLMs, reducing latency and costs without model calls.

Beneath the Ice, Artificial Intelligence Accelerates Oceanic Discovery.

The transformative power of artificial intelligence beneath ice is unlocking oceanic mysteries faster than ever, and there’s more to discover about these groundbreaking innovations.

Barret Zoph is out at OpenAI again after just five months

Barret Zoph departs OpenAI once more after returning in January, amid ongoing internal tensions and recent misconduct reports, confirming leadership changes.

The Safety Card, Played From Every Side: David Sacks, Anthropic, and the Fable Standoff

David Sacks says Anthropic ignored a serious Fable jailbreak. Anthropic says the flaw was minor. Key evidence remains non-public.