TL;DR

Perplexity Research published a June 1, 2026 proposal called Search as Code, arguing that AI agents should build retrieval pipelines in executable code rather than rely on fixed search endpoints. The company reports strong benchmark and token-efficiency gains, but those results are vendor-reported and the broader code-as-action idea predates Perplexity’s version.

Perplexity Research on June 1, 2026, published a proposal called Search as Code that would let AI agents write executable retrieval programs instead of sending repeated queries to a fixed search endpoint, a change the company says could cut token use and improve accuracy in long-running research tasks.

The confirmed development is Perplexity’s publication of the Search as Code, or SaC, approach. Perplexity describes SaC as a system in which a model acts as a control plane, writes Python code inside a secure sandbox, and uses an Agentic Search SDK made of smaller primitives for retrieval, ranking, filtering, fan-out and verification.

The company’s main claim is performance. In a CVE case study described in the source material, Perplexity said SaC identified and characterized more than 200 high-severity vulnerabilities with citations to vendor advisories and fix versions, reaching 100% accuracy while reducing token use from 288.7K tokens to 42.9K. Perplexity said rival systems in the same test scored below 25%.

Perplexity also reported that SaC led on four of five broader benchmark tests, tied OpenAI on HLE, and beat the next-best system on its WANDR benchmark by 2.5 times. Those figures are company-reported and have not been presented here as independently verified results.

AI Dispatch · Infrastructure

Search as Code

Q: Is Search as Code completely new?

The search-specific implementation appears to be Perplexity’s own work, but the broader idea of models using executable code to coordinate tools appeared earlier in research such as CodeAct and in later agent frameworks.

Q: What results did Perplexity report?

Perplexity said SaC reached 100% accuracy in a CVE case study while cutting token use by 85%, and that it led on four of five broader benchmark tests. Those are Perplexity-reported figures.

Q: Why does token use matter here?

Agentic research can generate large volumes of intermediate search results. Keeping irrelevant or duplicate material out of the model context window can lower cost, reduce latency and leave more room for useful evidence.

Q: What has not been proven yet?

Outside replication has not been shown in the supplied material. It remains unclear whether the benchmark gains will carry over to unrelated tasks, open datasets or rival search systems. Source: Thorsten Meyer AI

Perplexity says agents shouldn’t call a search engine — they should program one, composing atomic primitives into a bespoke pipeline in a sandbox. The thesis is right. It’s also the search-shaped version of an idea the field has been converging on since 2024.

■ The old contract

Monolithic search

One fixed pipeline. The model tweaks query params and consumes whatever comes back — through the context window, every time.

model → query(params)

engine → fixed pipeline

return → full result set

repeat ×N serial round-trips

⚠ every intermediate result routed through model context

▲ Search as Code

Amazon

search engine API development kit

As an affiliate, we earn on qualifying purchases.

Artificial Intelligence for Robotics: Build intelligent robots using ROS 2, Python, OpenCV, and AI/ML techniques for real-world tasks

As an affiliate, we earn on qualifying purchases.

Programmable primitives

The model writes code that orchestrates atomic search ops — fan-out, dedupe, verify — keeping bulk data out of the token stream.

sdk.search.web_many(queries)

filter()

dedupe()

sdk.llm.extract_many(schema)

→ verified records

✓ only the useful tokens reach the model

100%

CVE case-study accuracy (SaC run)

−85%

Token use vs baseline 288.7K → 42.9K

<25%

Score for the rival systems tested

2.5×

SaC lead on Perplexity’s own WANDR bench

A convergent idea, not a cold start

“Let the model write code instead of emitting tool calls” has been building for two years. SaC is the search-specific instantiation.

2024

CodeAct

Wang et al. · ICML

2024–25

smolagents

Hugging Face

2025

Code Mode

Cloudflare

Nov 2025

Code exec + MCP

Anthropic

Jun 2026

Search as Code

Perplexity

The take

Directionally right, genuinely engineered — the rebuilt-from-atoms search stack is the part rivals can’t cheaply copy. But it’s a strong execution of an industry-wide idea, validated mostly on benchmarks Perplexity ran itself. The moat is the infrastructure and the tuning loops, not the architecture.

Sources: Perplexity Research, “Rethinking Search as Code Generation” (Jun 1 2026); CodeAct (Wang et al., ICML 2024); HF smolagents; Cloudflare Code Mode; Anthropic “Code execution with MCP” (Nov 2025). Figures as reported by Perplexity.

thorstenmeyerai.com

Amazon

AI search pipeline development kit

As an affiliate, we earn on qualifying purchases.

Search Infrastructure Becomes the Moat

The proposal matters because agentic search has different demands from human search. A person usually submits a query and scans results. An agent working for minutes or hours may need to issue many searches, refine thin areas, compare sources, remove duplicates and verify structured records without sending every intermediate result back through the model context window.

If Perplexity’s approach holds up, the advantage would sit less in a single model prompt and more in the search stack beneath it. A system that exposes retrieval and ranking as programmable building blocks could give agents more control over evidence collection while reducing token costs. That would be relevant to research, security triage, due diligence, market intelligence and other tasks where source quality matters.

Django 6 Cookbook, Second Edition: Build modern full-stack apps with Django 6, Python 3.12, APIs, authentication, testing, search, and deployment

As an affiliate, we earn on qualifying purchases.

Code-First Agents Came Earlier

The contested point is novelty. The broader idea that language models should write executable code to coordinate tools has been building since at least 2024. Wang et al.’s CodeAct work, cited in the source material and available on arXiv at arxiv.org/abs/2402.01030, proposed executable Python as a unified action space for LLM agents and reported gains over more rigid action formats.

Later projects and products, including Hugging Face smolagents, Cloudflare Code Mode and Anthropic’s code execution with MCP, also moved toward code-mediated agent workflows. Perplexity’s contribution appears more specific: applying that pattern to the internals of search and rebuilding retrieval as composable primitives rather than a monolithic endpoint.

“Search as Code”
— Perplexity Research

Amazon

programmable search primitives

As an affiliate, we earn on qualifying purchases.

Self-Run Benchmarks Limit Certainty

It is not yet clear whether Perplexity’s reported gains will hold under outside testing, on other search corpora, or with different evaluation rules. The WANDR benchmark is described as Perplexity’s own benchmark, which makes independent replication especially important before drawing broad conclusions.

It is also unclear how much of SaC will become a developer-facing product, what safety controls will govern sandboxed code execution, and whether competitors with large search or browsing systems can reproduce the same behavior with existing infrastructure.

Replication and Product Details

The next milestones are independent benchmark runs, clearer product packaging and evidence that SaC works outside Perplexity’s own stack. Developers will also watch whether Perplexity exposes the Agentic Search SDK broadly, how it prices high-volume agent retrieval, and how it handles sandbox security and auditability.

Key Questions

What did Perplexity announce?

Perplexity Research published a June 1, 2026 proposal called Search as Code, where an AI agent writes code to assemble search operations from smaller retrieval primitives.

Is Search as Code completely new?

The search-specific implementation appears to be Perplexity’s own work, but the broader idea of models using executable code to coordinate tools appeared earlier in research such as CodeAct and in later agent frameworks.

What results did Perplexity report?

Perplexity said SaC reached 100% accuracy in a CVE case study while cutting token use by 85%, and that it led on four of five broader benchmark tests. Those are Perplexity-reported figures.

Why does token use matter here?

Agentic research can generate large volumes of intermediate search results. Keeping irrelevant or duplicate material out of the model context window can lower cost, reduce latency and leave more room for useful evidence.

What has not been proven yet?

Outside replication has not been shown in the supplied material. It remains unclear whether the benchmark gains will carry over to unrelated tasks, open datasets or rival search systems.

Source: Thorsten Meyer AI

Search as Code: Perplexity Is Right About the Future — Just Not First to It

Author

Deep Intellica Team

Share article

Search as Code