Code was never the hard part.
Engineering is.

Solve it once. Run it forever.

Merlin missions are TypeScript files that encode your expertise — which model runs each step, where you must approve, what always passes. Commit them once. They run for every engineer, on any device, autonomously until a checkpoint needs you.

Star on GitHub

You kick off an agent. Step 8 drifts. You context-switch to redirect it. You check again at step 15. You’ve lost two hours supervising instead of building. You’re not shipping with AI — you’re babysitting it.
Same tool. Radically different outcomes — because every engineer makes different instinctive decisions about when to intervene, which model to trust, what to validate. A 95%-accurate step repeated 20 times succeeds 36% of the time. The expertise is knowing where the human resets that clock. AgileEngine 2025

“What if knowing when to intervene was something you could ship?”

Missions

What Is a Mission?

Not a prompt. Not a script. Your judgment — when to run freely, when to intervene — expressed in plain language, committed as code.

Without Merlin

The agent runs. No one decided where to check in. Step 12 drifts. Steps 13–20 compound the error. You find out at the end — if you find it. And nobody on the team makes the same call twice.

With Merlin

You describe the rules: research runs freely, plan needs approval, execution stays scoped, tests are non-negotiable. Merlin writes that as a mission. It enforces your rules for every engineer, every run, without you watching.

A mission is a TypeScript file that encodes how a task should be done — not just what to do.

It’s also what you fire off when you need to focus on something else. The mission runs, validates, and halts at the exact moments that need your judgment. Ten missions across ten projects — you’re notified when any one of them needs you.

Your staff engineer describes the workflow intent; Merlin generates the mission and commits it to .merlin/missions/. It gets code-reviewed like any other code. Versioned. Shared across the team.

“Design when the machine needs you once.
It runs dependably forever.”
We call this Expertise as Code. Your knowledge of which model earns each step, where you must approve, and what must never be skipped — versioned in git, executable by anyone, improving with every run.

Architect

Senior / Staff Engineer

  • Describes the workflow: where to pause, which model, what must never be skipped
  • Asks Merlin to generate a mission — Merlin researches the codebase, proposes a plan, writes the TypeScript
  • Reviews the generated mission like a PR, optionally edits, commits to .merlin/missions/
  • Fires missions and works on other things while they run
  • Defines where humans must approve

Any Engineer

Junior, Mid, Senior — Everyone Else

  • Picks a mission, provides inputs
  • Gets the senior engineer’s checkpoint placement without needing to know why
  • Runs missions from phone, tablet, or desktop — approves at checkpoints, picks up results
  • Monitors progress in the Control Room from any device
  • Approves or edits the plan at defined checkpoints
  • Gets senior-quality output with guardrails baked in
  • No AI knowledge required

Complexity that fits the person

Level 1

Guided

Pick a mission. Provide inputs. Wait for results. Approve the plan when it appears. Decide whether to commit when it’s done. No AI knowledge required.

Level 2

Informed

Monitor execution in real time. Make decisions at branch points. Override model selection per step. Inspect every agent decision before it becomes code.

Level 3

Architect

Design the workflow: research strategies, validation criteria, approval checkpoints, model per step. Ask Merlin to generate the TypeScript. Review it, optionally edit it, commit it.

Most of your team runs at Level 1–2.
Your top engineers work at Level 3.
Everyone ships Level 3 quality.

Your process, as TypeScript. Generated by Merlin. Enforced every time.

Research before executing. Plan before coding. Approval before committing. You already know this workflow. Describe it to Merlin — Merlin writes the TypeScript. That mission runs the same way whether you’re at your desk or reviewing from your phone, and whether it’s you running it or the newest engineer on the team.

// Step 1: Research — Gemini Flash, read-only, cheap — runs freely
const research = await agent("Analyze the codebase for related patterns", {
  tools: ["glob", "grep", "read"],   // read-only — can’t accidentally change anything
  model: "gemini-flash",             // cheap and fast; near-100% on structured lookups
  provider: "google"
});

// Step 2: Plan — Sonnet for reasoning — balanced accuracy and cost
const plan = await agent("Propose a fix based on this research: " + research);
const approved = await prompt("ApprovalForm", {  // human checkpoint — resets the probability clock
  title: "Review the proposed plan",
  plan: plan,
  options: ["Approve", "Approve with changes", "Reject"]
});

// Step 3: Execute — only what the engineer approved
await agent("Implement the approved plan: " + approved);

// Step 4: Validate — always runs — the architect said validation is non-negotiable
await bash("npm run test");
const review = await agent("Review the diff for regressions");

// Step 5: Commit — your call — always
await prompt("ResultsView", {
  title: "Review changes before committing",
  diff: review,
  actions: ["Commit", "Request changes", "Discard"]
});

Five steps. Two human checkpoints. Three providers. Your expertise — encoded once, running dependably for every engineer on every run.

The Platform

Everything your missions need to run, monitor, and improve.

Control Room

The Switchboard shows every mission across every project, sorted by urgency — input needed, error, running, done. One view, full triage. Live metro map of each run. Get notified when a checkpoint needs you. Nothing to watch until then.

Full audit trail

Multi-Model

Haiku or Gemini Flash for research. Sonnet or GPT-4o for reasoning. Opus or Gemini Ultra where accuracy is critical. Mix Anthropic, OpenAI, Gemini, and local models in one mission — one provider per step, automatically optimal.

Per-step budget controls

Structural Guardrails

Can’t skip research. Can’t bypass approval. Can’t commit without review. V8 isolate crash isolation.

Architecture-enforced constraints

Works Everywhere

Fire a mission from your laptop. Approve the plan from your phone on the train. The machine keeps running. Pick up results on any device.

Cross-timezone handoff

Round Table

For your highest-stakes steps: 3 providers vote, majority wins. The architect places this at the nodes where per-step accuracy matters most.

Multi-provider validation

Knowledge Compounds

Every run audited. Engineers rate missions. Teams see what works. Missions improve with use.

Institutional knowledge retention

Why Now

AI agents reached capability. Orchestration didn’t keep up. 13% fewer junior engineers are entering the field — the apprenticeship model is breaking. “Vibe coding” proved that speed without discipline produces debt, not software. MCP standardized tool integration. The window to encode expertise is now. Sundeep Teki, Stack Overflow 2026

Knowledge Compounds

Every run that completes sharpens the expertise. Engineers rate missions. Teams see which checkpoint placements hold under real conditions. Missions improve. Your best engineers’ judgment doesn’t retire when they move on — it compounds in git, running for everyone who comes after.

Every engineer ships at the level of your best engineer.

Their expertise — encoded once, running dependably forever.

Star on GitHub