section3: An init system for an agent's container
Docker will restart a crashed container, but inside the container nothing is watching the processes. section3 is the small Go supervisor that keeps my AI agent's workspace running. It's now open source.
One container, many processes
My AI agent's workspace is a single dev container that hosts everything the agent needs to function: remote terminal access, a memory indexer that watches for file changes, a Telegram bot, a voice pipeline. Each one is a separate long-running process, and they share a filesystem, a network namespace, and a lifecycle.
For a while a shell script started them in the background, and that was the whole story. A crashed memory indexer stayed dead until I noticed searches returning stale results. The Telegram bot died with a network blip at 2am and the agent was unreachable until morning.
The textbook answer is one process per container, and it's the right answer for services that are actually independent. These processes aren't: they read and write the same workspace, and splitting them apart would have meant a compose file full of volume mounts pretending that one machine is six. What I wanted was an init system for the container itself, and containers don't come with one.
Why not an existing supervisor
The established options all carry weight that this job doesn't need. supervisord wants a Python runtime and INI files. s6 and runit are excellent and battle-tested, but they organize the world as service-directory trees with their own conventions to learn. pm2 assumes Node. systemd inside Docker is a fight you can win, but rarely worth picking.
The actual job is small: start N processes from a config file, restart the ones that die, keep their logs readable, and let me inspect everything. That's under a thousand lines of Go. One static binary, one YAML file, zero runtime dependencies:
defaults:
dir: /workspace
restart: always
services:
web:
command: /usr/local/bin/my-web-server --port 8080
worker:
command: /usr/local/bin/my-worker --queue default
restart: on-crash
depends_on:
- web
There was a second reason to write it instead of adopting one, and it's the same reason most of my infrastructure looks this way: the agent that operates the supervisor also maintains its code. A tool the agent can hold entirely in its head, source included, is a tool it can debug and extend without me.
The CLI is the daemon
Running section3 with no arguments starts the supervisor in the foreground. Every other invocation connects to the running daemon over a unix socket: status, start, stop, restart, reload, tail. There is no separate control binary and no client configuration, because the client knows exactly one thing, the socket path.
reload applies config edits without collateral damage: removed services stop, new services start, running services are left alone. A changed command takes effect on the next explicit restart of that service, which in practice is the behavior you want from a supervisor that's also managing things you didn't change.
Log rotation without cooperation
This is the design decision I'd defend hardest. Log rotation schemes usually need the service to cooperate: reopen your files when logrotate sends SIGUSR1, or accept the small data loss of copytruncate. Both assume the service was written by someone who thought about log rotation. Most of mine weren't.
section3 never gives the service a log file at all. The child's stdout and stderr go into a pipe; the supervisor copies from the pipe into the file. Since the supervisor owns the file descriptor, it can rotate at any moment, mid-write, mid-run, without telling the service anything. The service can't hold the old file open, because it never had it.
The threshold is per service (log_max_size, 1MB by default), with five rotated generations kept. The supervisor's own log obeys the same rules.
Crash loops, and the part I got wrong
Restarts back off exponentially, 1s doubling to a 60s ceiling, so a service that dies instantly doesn't burn a CPU core in a restart loop. The subtle half of that feature is when to reset the backoff. Until this week, section3 reset it only on an explicit stop. A service that crash-looped one bad night and then ran cleanly for a month would still wait the full 60 seconds after its next crash, because nothing ever forgave it.
The documentation claimed otherwise. It said the backoff "resets on successful start," which is what the code should have done and didn't. We found the contradiction during a docs review, traced it to the code, and fixed the code to match the sentence: a run of at least 60 seconds now counts as recovery, and the next crash starts over at 1s. When the docs and the implementation disagree, sometimes the docs are the spec.
As a Docker entrypoint
The intended deployment is the one it was built for:
RUN curl -fsSL https://signalshell.com/install-section3 | sh
COPY section3.yml /workspace/section3.yml
ENTRYPOINT ["/usr/local/bin/section3"]
Add init: true to the compose service. section3 reaps its own direct children, but when a service forks and dies, the orphaned grandchildren reparent to PID 1, and Docker's small init process reaps those. The lifecycle maps cleanly: docker stop sends SIGTERM and section3 shuts every service down gracefully before exiting, and docker exec gives you status and tail from outside.
The operator is an agent
The detail that shaped the design more than any technical constraint: the primary operator of this supervisor is an AI agent. It deploys a service, runs section3 status to see it come up, tails the log when something is wrong, and restarts what it just rebuilt. For that to work reliably, the command surface has to be small enough that the agent never has to guess, and failures have to be loud. An invalid config value fails the reload with an error message instead of falling back to a default, because a silent fallback is the kind of thing that costs an agent (or a human) an hour of confused debugging.
Get it
section3 is MIT-licensed at github.com/tachikoma-ghost/section3. Binaries are minisign-signed, and section3 self update verifies the signature before replacing itself.
curl -fsSL https://signalshell.com/install-section3 | sh
Tachikoma, the agent that container exists for, runs entirely under it: the remote terminal, the memory indexer, the Telegram bot, and the voice pipeline from the start of this post are all section3 services.
More writing on AI infrastructure, agent systems, and developer tooling.