Solve it once. Run it forever.
Merlin missions are TypeScript files that encode your expertise — which model runs each step, where you must approve, what always passes. Commit them once. They run for every engineer, on any device, autonomously until a checkpoint needs you.
“What if knowing when to intervene was something you could ship?”
Missions
Not a prompt. Not a script. Your judgment — when to run freely, when to intervene — expressed in plain language, committed as code.
Without Merlin
The agent runs. No one decided where to check in. Step 12 drifts. Steps 13–20 compound the error. You find out at the end — if you find it. And nobody on the team makes the same call twice.With Merlin
You describe the rules: research runs freely, plan needs approval, execution stays scoped, tests are non-negotiable. Merlin writes that as a mission. It enforces your rules for every engineer, every run, without you watching.A mission is a TypeScript file that encodes how a task should be done — not just what to do.
It’s also what you fire off when you need to focus on something else. The mission runs, validates, and halts at the exact moments that need your judgment. Ten missions across ten projects — you’re notified when any one of them needs you.
Your staff engineer describes the workflow intent; Merlin generates the mission and commits it to .merlin/missions/. It gets code-reviewed like any other code. Versioned. Shared across the team.
Architect
Senior / Staff Engineer
.merlin/missions/Any Engineer
Junior, Mid, Senior — Everyone Else
Pick a mission. Provide inputs. Wait for results. Approve the plan when it appears. Decide whether to commit when it’s done. No AI knowledge required.
Monitor execution in real time. Make decisions at branch points. Override model selection per step. Inspect every agent decision before it becomes code.
Design the workflow: research strategies, validation criteria, approval checkpoints, model per step. Ask Merlin to generate the TypeScript. Review it, optionally edit it, commit it.
Most of your team runs at Level 1–2.
Your top engineers work at Level 3.
Everyone ships Level 3 quality.
Your process, as TypeScript. Generated by Merlin. Enforced every time.
Research before executing. Plan before coding. Approval before committing. You already know this workflow. Describe it to Merlin — Merlin writes the TypeScript. That mission runs the same way whether you’re at your desk or reviewing from your phone, and whether it’s you running it or the newest engineer on the team.
// Step 1: Research — Gemini Flash, read-only, cheap — runs freely
const research = await agent("Analyze the codebase for related patterns", {
tools: ["glob", "grep", "read"], // read-only — can’t accidentally change anything
model: "gemini-flash", // cheap and fast; near-100% on structured lookups
provider: "google"
});
// Step 2: Plan — Sonnet for reasoning — balanced accuracy and cost
const plan = await agent("Propose a fix based on this research: " + research);
const approved = await prompt("ApprovalForm", { // human checkpoint — resets the probability clock
title: "Review the proposed plan",
plan: plan,
options: ["Approve", "Approve with changes", "Reject"]
});
// Step 3: Execute — only what the engineer approved
await agent("Implement the approved plan: " + approved);
// Step 4: Validate — always runs — the architect said validation is non-negotiable
await bash("npm run test");
const review = await agent("Review the diff for regressions");
// Step 5: Commit — your call — always
await prompt("ResultsView", {
title: "Review changes before committing",
diff: review,
actions: ["Commit", "Request changes", "Discard"]
});
Five steps. Two human checkpoints. Three providers. Your expertise — encoded once, running dependably for every engineer on every run.
Everything your missions need to run, monitor, and improve.
The Switchboard shows every mission across every project, sorted by urgency — input needed, error, running, done. One view, full triage. Live metro map of each run. Get notified when a checkpoint needs you. Nothing to watch until then.
Full audit trailHaiku or Gemini Flash for research. Sonnet or GPT-4o for reasoning. Opus or Gemini Ultra where accuracy is critical. Mix Anthropic, OpenAI, Gemini, and local models in one mission — one provider per step, automatically optimal.
Per-step budget controlsCan’t skip research. Can’t bypass approval. Can’t commit without review. V8 isolate crash isolation.
Architecture-enforced constraintsFire a mission from your laptop. Approve the plan from your phone on the train. The machine keeps running. Pick up results on any device.
Cross-timezone handoffFor your highest-stakes steps: 3 providers vote, majority wins. The architect places this at the nodes where per-step accuracy matters most.
Multi-provider validationEvery run audited. Engineers rate missions. Teams see what works. Missions improve with use.
Institutional knowledge retentionWhy Now
AI agents reached capability. Orchestration didn’t keep up. 13% fewer junior engineers are entering the field — the apprenticeship model is breaking. “Vibe coding” proved that speed without discipline produces debt, not software. MCP standardized tool integration. The window to encode expertise is now. Sundeep Teki, Stack Overflow 2026Every run that completes sharpens the expertise. Engineers rate missions. Teams see which checkpoint placements hold under real conditions. Missions improve. Your best engineers’ judgment doesn’t retire when they move on — it compounds in git, running for everyone who comes after.
Their expertise — encoded once, running dependably forever.