TL;DR
Perplexity Research published a June 1, 2026 proposal called Search as Code, arguing that AI agents should build retrieval pipelines in executable code rather than rely on fixed search endpoints. The company reports strong benchmark and token-efficiency gains, but those results are vendor-reported and the broader code-as-action idea predates Perplexity’s version.
Perplexity Research on June 1, 2026, published a proposal called Search as Code that would let AI agents write executable retrieval programs instead of sending repeated queries to a fixed search endpoint, a change the company says could cut token use and improve accuracy in long-running research tasks.
The confirmed development is Perplexity’s publication of the Search as Code, or SaC, approach. Perplexity describes SaC as a system in which a model acts as a control plane, writes Python code inside a secure sandbox, and uses an Agentic Search SDK made of smaller primitives for retrieval, ranking, filtering, fan-out and verification.
The company’s main claim is performance. In a CVE case study described in the source material, Perplexity said SaC identified and characterized more than 200 high-severity vulnerabilities with citations to vendor advisories and fix versions, reaching 100% accuracy while reducing token use from 288.7K tokens to 42.9K. Perplexity said rival systems in the same test scored below 25%.
Perplexity also reported that SaC led on four of five broader benchmark tests, tied OpenAI on HLE, and beat the next-best system on its WANDR benchmark by 2.5 times. Those figures are company-reported and have not been presented here as independently verified results.
Search as Code
Perplexity says agents shouldn’t call a search engine — they should program one, composing atomic primitives into a bespoke pipeline in a sandbox. The thesis is right. It’s also the search-shaped version of an idea the field has been converging on since 2024.
Monolithic search
search engine API development kit
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.

Artificial Intelligence for Robotics: Build intelligent robots using ROS 2, Python, OpenCV, and AI/ML techniques for real-world tasks
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Programmable primitives
Directionally right, genuinely engineered — the rebuilt-from-atoms search stack is the part rivals can’t cheaply copy. But it’s a strong execution of an industry-wide idea, validated mostly on benchmarks Perplexity ran itself. The moat is the infrastructure and the tuning loops, not the architecture.
AI search pipeline development kit
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Search Infrastructure Becomes the Moat
The proposal matters because agentic search has different demands from human search. A person usually submits a query and scans results. An agent working for minutes or hours may need to issue many searches, refine thin areas, compare sources, remove duplicates and verify structured records without sending every intermediate result back through the model context window.
If Perplexity’s approach holds up, the advantage would sit less in a single model prompt and more in the search stack beneath it. A system that exposes retrieval and ranking as programmable building blocks could give agents more control over evidence collection while reducing token costs. That would be relevant to research, security triage, due diligence, market intelligence and other tasks where source quality matters.

Django 6 Cookbook, Second Edition: Build modern full-stack apps with Django 6, Python 3.12, APIs, authentication, testing, search, and deployment
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Code-First Agents Came Earlier
The contested point is novelty. The broader idea that language models should write executable code to coordinate tools has been building since at least 2024. Wang et al.’s CodeAct work, cited in the source material and available on arXiv at arxiv.org/abs/2402.01030, proposed executable Python as a unified action space for LLM agents and reported gains over more rigid action formats.
Later projects and products, including Hugging Face smolagents, Cloudflare Code Mode and Anthropic’s code execution with MCP, also moved toward code-mediated agent workflows. Perplexity’s contribution appears more specific: applying that pattern to the internals of search and rebuilding retrieval as composable primitives rather than a monolithic endpoint.
“Search as Code”
— Perplexity Research
programmable search primitives
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Self-Run Benchmarks Limit Certainty
It is not yet clear whether Perplexity’s reported gains will hold under outside testing, on other search corpora, or with different evaluation rules. The WANDR benchmark is described as Perplexity’s own benchmark, which makes independent replication especially important before drawing broad conclusions.
It is also unclear how much of SaC will become a developer-facing product, what safety controls will govern sandboxed code execution, and whether competitors with large search or browsing systems can reproduce the same behavior with existing infrastructure.
Replication and Product Details
The next milestones are independent benchmark runs, clearer product packaging and evidence that SaC works outside Perplexity’s own stack. Developers will also watch whether Perplexity exposes the Agentic Search SDK broadly, how it prices high-volume agent retrieval, and how it handles sandbox security and auditability.
Key Questions
What did Perplexity announce?
Perplexity Research published a June 1, 2026 proposal called Search as Code, where an AI agent writes code to assemble search operations from smaller retrieval primitives.
Is Search as Code completely new?
The search-specific implementation appears to be Perplexity’s own work, but the broader idea of models using executable code to coordinate tools appeared earlier in research such as CodeAct and in later agent frameworks.
What results did Perplexity report?
Perplexity said SaC reached 100% accuracy in a CVE case study while cutting token use by 85%, and that it led on four of five broader benchmark tests. Those are Perplexity-reported figures.
Why does token use matter here?
Agentic research can generate large volumes of intermediate search results. Keeping irrelevant or duplicate material out of the model context window can lower cost, reduce latency and leave more room for useful evidence.
What has not been proven yet?
Outside replication has not been shown in the supplied material. It remains unclear whether the benchmark gains will carry over to unrelated tasks, open datasets or rival search systems.
Source: Thorsten Meyer AI