A B2B services company with 16,000+ HubSpot contacts and a multi-million dollar revenue target needed their sales pipeline automated. The CEO was manually reviewing contacts, drafting outreach emails, and managing follow-ups across multiple reps. The ask: build an AI system that generates a prioritized task queue every morning that the CEO can review and approve in under 20 minutes.
Nothing sends without human approval. This was a non-negotiable constraint that shaped the entire architecture. Every outreach email, every follow-up, every cold intro sits in a queue until a human says go. The system recommends. The human decides.
OpenClaw (a self-hosted AI agent framework) serves as the brain. Python CLI tools serve as the hands. The pattern: OpenClaw thinks and plans, Python tools execute. The tools read HubSpot, score contacts, draft emails, and queue tasks in Airtable.
The execution surface is Airtable. The CEO controls everything through table views and status fields. Adding a new sales rep or changing ICP criteria requires zero code changes. n8n (a self-hosted workflow automation platform) polls Airtable every 60 seconds and sends approved emails from the correct inbox.
Config-driven architecture: every business rule lives in Airtable rows, not code.
All 5 agents run in a single Docker container with shared tool libraries. Each agent has a distinct role and trigger schedule.
| Agent | Role | Trigger |
|---|---|---|
| Sales Agent | Morning scan, contact scoring 0-100, email drafting, task generation | 6 AM daily + midday |
| Reply Router | Classifies inbound sales email into 8 types, drafts contextual reply | On new email |
| Deal Mover | Cadence-based follow-ups, stage velocity tracking, flags stalling deals | Hourly heartbeat |
| Sales Manager | Revenue vs target by service line, rep scorecards, weighted escalation | 7 AM daily Slack report |
| Lead Generator | ICP scoring, lead enrichment, 100+ leads/day for cold callers | Daily batch |
The system uses a dual persona model. An AI sales assistant persona handles routine outreach. A second persona matches the CEO's writing voice for VIP and C-level contacts. Routing is automatic based on contact tier and deal value.
Production systems reveal bugs that tests and staging never surface. These are the real failures from this build.
Three of five scoring components returned hardcoded or default values because the HubSpot fields they read were never populated. The priority queue looked functional. Contacts had scores, tasks were ranked. But the rankings were essentially random. Found and fixed by auditing every scoring function against real CRM data.
The opt-out check was case-sensitive. HubSpot stores opt-out flags inconsistently ("YES", "yes", "True"). Only one variant matched. Contacts who had opted out could have received email. Fixed with normalized comparison + 13 regression tests.
Contact names with apostrophes (O'Brien) or quotes broke Airtable filter formulas, or worse, could inject formula logic. Found across 16 formula-building sites in 7 files. Fixed with a shared escape utility.
The code assumed Airtable field names. The client's actual Airtable had different names. 28 tests failing, systematic remap across 4 core modules.
Kill switch: any agent pausable independently. Master pause stops all outbound in under 60 seconds.
AI sales agents connect to HubSpot via the Private App API to read contacts, deals, and activity history, then generate prioritized outreach tasks. In this system, the agent scores every contact 0-100 based on deal value, recency, and ICP match, then drafts personalized emails queued for human approval.
Yes, but with guardrails. This system requires human approval for all outreach except pure scheduling emails, which can auto-send if they pass 5 safety checks: scheduling keyword detected, under daily cap, within business hours, non-VIP contact, no escalation flag.
The core single-agent system shipped in 7 days. Expanding to 5 agents with reply routing, deal management, sales reporting, and lead generation took an additional 2 weeks. Total: 23 days from first commit to full deployment.
This system has 1,360 tests across 46 test files, a 2.2:1 test-to-code line ratio. Tests cover golden-path integration, per-rep routing, scheduling autonomy edge cases, guardrails enforcement, CAN-SPAM compliance, and audit logging.
Three layers: (1) human approval required for all non-scheduling outreach, (2) CAN-SPAM opt-out checking with normalized field comparison, (3) independent kill switches per agent with a master pause that stops all outbound within 60 seconds.
OpenClaw is a self-hosted AI agent framework that runs an agentic loop: think, plan, act, observe. It serves as the reasoning engine while Python CLI tools handle execution (CRM reads, email drafting, task queuing). It runs in Docker and uses Claude as the underlying LLM.