Approach

12-factor agents for real production constraints.

We design agents like real systems: prompts you can version, context windows you can inspect, tools you can audit, and state you can replay. No magic, just engineering discipline for LLM-native workflows.

Our consulting philosophy is machine-native—collapse the process instead of simulating it. Every workshop produces a runnable artifact wired into the real workflow so your teams see live impact, not a staged prototype.

Function mapping

Workflows for machines, not improv theatre

We don’t assign job titles to agents or make them click around pretend dashboards. We map intent to functions, remove glue steps, and enforce contracts in code so the model can’t wander off-script.

Treat the model as a callable function, not a teammate.
Flatten steps; fewer hops mean fewer failure points.
Prefer JSON contracts over UI clicks.
Expose tools directly—no proxy dashboards.
Define signatures and error paths for every tool.
Let humans speak freely, but keep the system typed and strict.
Drop the office metaphor; route inputs to execution.
Before → After

Collapse the workflow

Example: summarising highlights from a source and saving them to a knowledge base.

Human-mimic (6+ steps)
  • • Export highlights
  • • Paste into chat
  • • Ask for a summary
  • • Copy response
  • • Open notes app
  • • Paste and tag
Function-mapped (2 steps)
  • • fetch_highlights(source)
  • • write_summary(destination, tags)

Contracts and error handling live in code, not in a prompt. The model speaks JSON, not small talk.

Outcomes: less latency, fewer retries, clearer governance. This is the backbone of how we design agents.

Consulting Philosophy

Collapse the process, don’t simulate it.

Machine-native initiatives only work when the living process moves inside the product. Instead of playacting a future workflow in slides, we wire prompts, tools, and governance into the actual systems your team already trusts. We refuse to LARP office life with bots—AI isn’t a teammate, it’s a function wired straight into the outcome.

Field data

Start with real transcripts, logs, and operator rituals. We encode that signal directly into prompts, RAG stores, and evaluation harnesses so the agent inherits institutional judgement.

Runnable deliverables

Every milestone ships a callable API, ops dashboard, or governance checklist—proof that the workflow already lives inside the machine before rollout scales.

Execution rules

  • • Treat AI like a function call: define schemas, not fake job titles.
  • • Flatten steps so nothing gets routed through pretend interfaces.
  • • Keep language soft but contracts hard—JSON over UI every time.
  • • Enforce behaviour in code so the model can’t drift into improv.
Drop the Simulation

Function Mapping Sprint: collapse one workflow in 10 days.

We attract the right partners by putting their “AI improv sketch” on blast. You bring one broken agent workflow; we dismantle the roleplay and replace it with callable functions, JSON contracts, and live telemetry that proves the collapse.

Sprint cadence

Days 0‑2 · Field Signal

Mine transcripts, tickets, and runbooks. Annotate every step that exists only for human cognition.

Days 3‑7 · Function Index

Rewrite the flow as schemas + execution contracts. Route language inputs to deterministic functions.

Days 8‑10 · Collapse Proof

Ship a runnable demo + dashboard showing steps removed, latency dropped, and failure points eliminated.

Steps removed

−78%

Latency drop

−9m

Mini case

Claims “voice AI adjuster” went from seven pretend-human steps to three direct calls: fetch_policy → submit_adjustment_claim → schedule_inspection. No agents in Slack, no fake office tour.

Ops teams now watch JSON logs instead of listening to roleplay. Sales teams use the collapse dashboard in pitches—because nothing attracts enterprise buyers like proof the bureaucracy is gone.

Ready to see your agent improv collapse into a three-call machine? Book a Sprint and we’ll show you the before/after telemetry live on the workshop call.

Four pillars of our agent design

The same patterns powering our voice stack and `/api/assistant` route show up in every engagement. We treat each pillar as an artifact you can review, test, and evolve—not a hidden prompt in a route handler.

Prompts

System prompts are versioned, named, and documented (`system-prompts.ts`). That means you can A/B test agent behaviour, roll back changes, and tie specific outcomes back to prompt versions.

Context

We own the context window (`context-manager.ts`): which RAG documents, which history, which query. You get observability into token budgets, trimming, and what the model actually saw.

Tools

Tools are first-class, typed definitions (`tool-definitions.ts`) with explicit parameters and user-facing messages. The agent can call `get_product_info`, `schedule_consultation`, or `check_pricing`, and every call is auditable.

State

Conversation state is managed via a reducer (`conversation-state.ts`), so you can replay sessions, debug decisions, and plug in persistence without changing the agent logic.

Governance

The same ideas show up in Governance Hub and evaluation harnesses: compact, user-friendly error messages; retries with backoff; and logs that your risk and compliance teams can read.

How this shows up in client work

Whether we are building a realtime voice agent, a back-office copilot, or a mesh of task agents, we follow the same pattern: define prompts, curate context, expose tools, wire state, and plug into your observability stack.

You gain confidence that agents will behave consistently across channels, and your teams gain a shared vocabulary to reason about failures, regressions, and improvements.

• Architecture diagrams and design docs for every agent surface.

• Evaluation harnesses that mirror real workflows, not toy prompts.

• Integration with existing observability tools (logs, traces, metrics).

• Clear hand-offs between human operators and agents.

See it in action

Explore the 12-factor voice agent.

The same principles power our Athena voice stack and `/api/assistant` endpoint. In demos, we can walk through how prompts, context, tools, and state come together in realtime.

• Live walkthrough of the voice pipeline (STT → RAG → LLM → TTS).

• Discussion of failure modes, guardrails, and evaluation patterns.

• Optional deep dive into how this would map onto your stack.