What counts as an agent
There's a lot of noise around 'AI agents' in service business marketing. Most of what gets sold is a Zapier flow with a GPT step bolted on, doing the same thing a templated email always did, just slower and with hallucination risk.
An agent, in our definition, owns a workflow end-to-end. It has an input contract, an output contract, a defined success metric, and a kill-switch. It writes back to systems of record. It is observable. It can be killed and replaced without a re-architecture.
Everything else is a chatbot.
The six default agents
Aria reads every inbound enquiry, scores against your ICP, and routes to the right rep. She lives in Supabase. She speaks to Claude 3.5 over an SDK call. She writes back to HubSpot or your CRM directly.
Echo is the missed-call agent. The single highest-ROI agent we ship. Texts back within 9 seconds of a missed call. Books or hands off. She's saved trades operators 40%+ in lost revenue inside the first 30 days, every time.
Ledger is the reporting agent. Daily summary across ad accounts, CRM and call data, in Slack at 7am sharp. Replaces the Monday-morning manual pull every operator we've audited was doing manually.
Pivot runs the long-cycle nurture. 60 to 180 day sequences with branching logic per buyer signal. GPT-4o + HubSpot. The agent that compounds the slowest but pays the most over a 12-month window.
Verity is reviewer-side. Post-job review request, sentiment-watch, reputation defence in one loop. The agent that prevents the slow-bleed problem of negative reviews accumulating un-responded.
Forge is the content engine. Weekly insights drafts, briefs and SOPs from your brand voice and data. The least exciting agent and the one operators get most attached to once it's running.
Observability and eval
If you can't see every model call, score the output, and replay the failures, you're running an agent in production blind. We use Langfuse on every agent by default. Some clients prefer Helicone, that's fine, the pattern is the same.
Every agent has a success metric defined before the prompt is written. Aria's metric is 'percentage of routed leads that human reps confirm were correctly routed'. Echo's is 'percentage of missed-call text-backs that resulted in a booked job inside 24 hours'. Drift gets caught on the weekly review.
Why we build model-agnostic
Claude 3.5 is the default we ship today. Tomorrow it might be Claude 4. Next year it might be something we haven't heard of yet. Every agent we build switches models through a config, not a re-write. The prompts are versioned in Git, the evals are reusable, the kill-switch criteria are model-independent.
Vendor lock kills both bargaining power and stack flexibility. We don't build it in.
Ownership and IP
Every agent we build, the client owns. Day one. Prompts, evals, observability dashboards, Supabase rows, the lot. We sign IP transfer at contract signing.
Most AI consultancies hold the prompts back. We don't. The whole point is the system keeps running, and keeps being editable, after we're gone.