Continual Coder

Open source · MIT licence · Creator & sole author · github.com/daz-williams/continual-coder ↗

A minimal, self-improving coding harness for LLM agents - a rapid-prototyping system in a box. An agent fixes failing tests in a repo, and a refiner rewrites the agent's own long-term memory from each trajectory, so it accumulates codebase-specific knowledge and gets smarter with every task.

MIT

licence

stacks scaffolded

CLI to drive it all

cloud dependencies

What it is

Per-task coding assistants are excellent - and static. They start every task from zero. Continual Coder adds the outer loop: after each task, a refiner distils what the agent learned from its own trajectory into persistent, on-disk memory, so the next task starts smarter. Reset-free in spirit - memory survives across tasks, runs, and restarts.

It's a small, readable implementation of the idea behind self-improving agent harnesses (skill libraries, self-rewriting systems), pointed at the one domain with a clean, incorruptible verifier: code that must pass tests. The agent proposes edits, the verifier runs the tests, the loop retries until green - then the learning gets written down.

Everything runs against your own model endpoint using the standard chat-completions API - built for locally hosted open-weights models, so nothing leaves your machine. Each app lives in its own container with its own workspace, memory, and metrics.

The cc CLI

A prototyping system in a box - idea to running app, one command at a time.

cc new --wizard - interview to spec

A short Q&A (hard-capped at eight questions) turns your idea into a written spec, a project skeleton, and a first set of tests that are expected to fail - they're the target the build loop drives toward.
cc run - the self-improving loop

The agent proposes file edits, the verifier runs the tests, it retries until green - then the refiner distils what it learned into that app's memory so the next task starts smarter.
cc serve · cc share · cc summary - run it, demo it, measure it

Start the app's dev server, share a gated tunnel-based demo link, and check the metrics that matter: is the learning actually compounding?
cc task - keep iterating

Queue the next feature as a task and run it. Every phase is re-runnable by hand, so a messy bootstrap never means starting over.

Why it matters

It's the distilled version of how I build: agentic loops with a hard verifier, memory that compounds, local-first infrastructure, and container isolation as a default rather than an afterthought. And because it's MIT-licensed and a deliberately small codebase, you can read the whole thing in a sitting - it's my working style, in public.

Stack

PythonBash CLIContainer isolationLocal open-weights LLMsStandard chat-completions APITest-driven verifierPersistent agent memoryMIT licence

Next: Monkey. Human. Robot. → View on GitHub ↗