Back to archive
Agentic Systems
2025
Lead builder

Deep Research Workflow

Multi-agent research system that routes work by query complexity and adds evaluator loops to improve quality while reducing LLM cost.

Production-style multi-agent research system implementing async orchestration, intelligent routing, and automated evaluation loops to optimise research quality and API cost.

Key Outcomes

7 specialized agents orchestrated via OpenAI Agents SDK
Adaptive workflow routing based on task complexity
Lower cost and latency without sacrificing answer quality

Context

Problem and Context

Single-pass LLM research assistants tend to be expensive, brittle, and hard to trust for analytical work that spans multiple sources.

The challenge was to design a system that could vary its depth based on the prompt while still validating quality before delivery.

Approach

Approach and Architecture

The workflow uses an explicit router and planner to decide whether a request needs lightweight synthesis or a deeper multi-agent research path.

Evaluation is treated as a first-class step rather than an afterthought, allowing the system to retry or refine when the answer quality is below threshold.

Diagrams

System Diagrams

Static diagrams included with the project to show architecture, workflow, and data movement at a glance.

Deep research architecture diagram
High-level agent topology for the deep research workflow.

Implementation

Implementation Details

Async execution enables parallel search and evidence gathering, which materially reduces latency on the heavier research paths.

Pydantic schemas keep agent handoffs typed so downstream stages can reason over predictable payloads instead of parsing free-form text.

Clarify -> Router -> Planner -> Parallel Search -> Writer -> Evaluator -> Email
Cost-aware workflow gating for simple vs complex queries
Email-ready delivery format for end users

Results

Results and Tradeoffs

The main improvement came from matching the workflow depth to the task instead of forcing every request through the same expensive pipeline.

The evaluation loop made the system more production-like because it surfaced quality as an operational concern rather than a manual review task.

30-50% API cost reduction on simple research queries
40-60% output quality improvement on complex analytical tasks
~15 second average response time on simple queries

Lessons

Lessons and Next Steps

Agent systems become easier to evolve when their interfaces are explicit and typed. The orchestration layer matters as much as the prompts.

A future iteration would add richer telemetry per stage so routing and evaluation thresholds can be tuned from observed behavior instead of static heuristics.

Explore More

Related Projects

Browse adjacent work from the same archive group or jump back to the project archive.

Back to archive