12-factor agents for production
We design agents like real systems: prompts you can version, context windows you can inspect, tools you can audit, and state you can replay.
No magic, just engineering discipline for LLM-native workflows.
Workflows for machines, not improv
We don't assign job titles to agents or make them click around pretend dashboards. We map intent to functions, remove glue steps, and enforce contracts in code.
Collapse the workflow
Example: summarising highlights from a source and saving them to a knowledge base.
- 1Export highlights
- 2Paste into chat
- 3Ask for summary
- 4Copy response
- 5Open notes app
- 6Paste and tag
Contracts and error handling live in code, not in a prompt. The model speaks JSON, not small talk.
Function Mapping Sprint
Collapse one workflow in 10 days.
You bring one broken agent workflow; we dismantle the roleplay and replace it with callable functions, JSON contracts, and live telemetry that proves the collapse.
Field Signal
Mine transcripts, tickets, and runbooks. Annotate every step that exists only for human cognition.
Function Index
Rewrite the flow as schemas + execution contracts. Route language inputs to deterministic functions.
Collapse Proof
Ship a runnable demo + dashboard showing steps removed, latency dropped, and failure points eliminated.
Four pillars of agent design
The same patterns powering our voice stack and API routes show up in every engagement. Each pillar is an artifact you can review, test, and evolve.
Prompts
System prompts are versioned, named, and documented. A/B test agent behaviour, roll back changes, and tie outcomes back to prompt versions.
system-prompts.tsContext
We own the context window: which RAG documents, which history, which query. Observability into token budgets, trimming, and what the model actually saw.
context-manager.tsTools
Tools are first-class, typed definitions with explicit parameters and user-facing messages. Every call is auditable.
tool-definitions.tsState
Conversation state is managed via a reducer, so you can replay sessions, debug decisions, and plug in persistence.
conversation-state.tsHow this shows up in client work
Whether we're building a realtime voice agent, a back-office copilot, or a mesh of task agents, we follow the same pattern: define prompts, curate context, expose tools, wire state, and plug into your observability stack.
You gain confidence that agents will behave consistently across channels, and your teams gain a shared vocabulary to reason about failures, regressions, and improvements.
Explore the 12-factor voice agent
The same principles power our Athena voice stack. In demos, we can walk through how prompts, context, tools, and state come together in realtime.