Work / Brimley
Case study 03 · Autonomous SDR
Brimley
Live in production · Closed beta · Founder & full-stack AI engineer · brimley.ai ↗
A butler for outbound sales. You give it a brief and an ICP; it sources prospects, writes a personalised multi-step sequence per lead, sends from your own mailbox, classifies replies, and handles bounces, unsubscribes and pacing - autonomously, on your own infrastructure.
22
app modules
~30k
LOC in apps/
147
HTML templates
49
test files
1
engineer
4mo
to production
Why it's different
Most "AI agents" are demos. This is built like a system.
Outbound sales is the most labour-intensive function in B2B and the part most likely to be replaced by AI agents in the next 24 months. Existing tools each solve one piece of it. Brimley does the whole loop - sourcing, research, generation, sending, polling, classifying, suppressing - autonomously, on infrastructure the customer owns.
This isn't a wrapper around an API. It's a production-shaped multi-module Python system with a real state machine, idempotency on every external side effect, queryset-level tenant isolation with dedicated regression tests, cost-tracked AI calls stamped with token counts and USD at call time, and a hot-editable prompt layer with version-controlled defaults. Every interesting decision has a docstring explaining why.
Campaign state machine
Explicit transitions. Explicit failure paths.
Pipeline
Source → Schedule → Dispatch → Generate → Send → Poll → Classify.
Source
A background task calls the lead provider with the campaign's ICP filters, reveals emails (cached cross-org by provider ID), bills via the credit ledger, persists
Prospect+CampaignProspectrows.Schedule
Scheduler stamps
next_action_aton each row from the campaign's daily cap, sending hours, sending days, timezone, and deterministic ±25% jitter.Dispatch
dispatch_due_messagesbeat task wakes every minute, picks rows whosenext_action_athas passed, prioritises by(current_step DESC, next_action_at ASC), firessend_email.delay()per row, capped at the org-wide daily total.Generate
Message generator builds a per-prospect email using campaign voice, sequence step, and research. Call is logged with token counts + USD cost.
Send
Verifier runs first.
OutreachMessagepersisted. Delivered via the user's authenticated mailbox with an RFC 8058 List-Unsubscribe header injected. Suppressions short-circuit before the API call.Poll & classify
Mailbox history poll every 5 minutes walks each mailbox since the last cursor, classifies inbound replies/bounces/OOO via an LLM-graded reply classifier, halts on a reply, suppresses on a bounce.
Things most demos hand-wave
Each tile is a decision a senior engineer would notice and ask about - and would find a defensible answer.
ETA-based dispatch
Pacing is a property of when sends are scheduled, not a task-queue rate-limit. Deterministic ordering, faithful distribution, no over/under-run. ±25% deterministic jitter prevents round-minute spam-flagging.
Idempotency everywhere
Every lead reveal, mailbox send, and LLM call goes through IdempotencyRecord. A retry on a network blip can't double-send, double-bill, or double-spend.
Queryset-level tenant isolation
Explicit .for_organization(org) on every query - not a middleware thread-local that breaks the moment a background task runs. Dedicated CI test fails if a new view forgets it.
Cost-tracked AI calls
Every LLM call wrapped to stamp per-token pricing at call time, persist input/output/cache-read/cache-creation tokens, and persist USD cost. Historical accounting stays correct when pricing changes.
Hot-editable prompts
Every system prompt lives in apps/ai/prompts/*.md AND in a PromptOverride DB table. Resolution: DB row wins if non-empty, otherwise the on-disk default. Iterate without deploys; version-control the canonical default.
RFC 8058 + suppression
Every send carries a List-Unsubscribe header. Bounces auto-suppress. Unsubscribes auto-suppress. Domain-level suppression is a first-class table, not a flag on a row.
Selected decisions
The patterns I'd point at in a deep-dive interview.
Structured AI output via typed schemas
Reply classification, ICP parsing, and several other AI calls validate the LLM's output against a typed schema before it ever reaches the rest of the system - no JSON-string brittleness.
Mailbox history polling rather than push
Push notifications via Pub/Sub require a verified domain and a more invasive OAuth posture. History polling on a 5-minute cadence with the per-mailbox last_history_id cursor gives the same correctness with a simpler trust posture - and survives downtime gracefully.
LinkedIn companion as an MV3 extension, not headless automation
Headless LinkedIn automation gets accounts banned and violates LinkedIn's ToS. Running the automation inside the user's own authenticated Chrome session keeps it inside the terms-of-service envelope and avoids cookie-store/fingerprint problems.
Honest about boundaries
Mailbox OAuth scopes are send + modify only - Brimley does not read the inbox beyond the threads it sent on. OAuth tokens encrypted at rest with envelope encryption. HMAC-peppered indices on sent-message lookups.
Single-tenant by default, multi-tenant capable
Brimley is designed to be self-hosted on the customer's own Postgres + Redis. But the data model is fully org-scoped, so the same image runs in a managed multi-tenant mode. Same code; the difference is operational.
Real observability
Worker heartbeats. Incident ladder. Health dashboard. Error tracking + structured logging. A dry-run mode (OUTREACH_DRY_RUN). A maintenance mode that short-circuits sending tasks rather than dropping them.
Stack
Web / API
Async / queue
Datastores
AI
Integrations
Security & ops