Code was never the hard part.
Engineering is.

Solve it once. Run it forever.

Merlin missions are TypeScript files that encode your expertise — which model runs each step, where you must approve, what always passes. Commit them once. They run for every engineer, on any device, autonomously until a checkpoint needs you.

Star on GitHub

You kick off an agent. Step 8 drifts. You context-switch to redirect it. You check again at step 15. You’ve lost two hours supervising instead of building. You’re not shipping with AI — you’re babysitting it.

Same tool. Radically different outcomes — because every engineer makes different instinctive decisions about when to intervene, which model to trust, what to validate. A 95%-accurate step repeated 20 times succeeds 36% of the time. The expertise is knowing where the human resets that clock. AgileEngine 2025

“What if knowing when to intervene was something you could ship?”

Missions

What Is a Mission?

Not just guidance for an agent to interpret. A mission is executable control over how the work runs: which model handles each step, how contexts split and pass state, where humans interact, what always gets validated, when the system must stop, and what logic the workflow can express.

Without Merlin

The agent runs. No one decided where to check in. Step 12 drifts. Steps 13–20 compound the error. You find out at the end — if you find it. And nobody on the team makes the same call twice.

With Merlin

You describe the rules: research runs freely, plan needs approval, execution stays scoped, tests are non-negotiable. Merlin writes that as a mission. It enforces your rules for every engineer, every run, without you watching.

A mission is a TypeScript file that encodes how a task should be done — not just what to do.

It’s also what you fire off when you need to focus on something else. The mission runs, validates, and halts at the exact moments that need your judgment. Ten missions across ten projects — you’re notified when any one of them needs you.

Your staff engineer describes the workflow intent; Merlin generates the mission and commits it to .merlin/missions/. It gets code-reviewed like any other code. Versioned. Shared across the team.

“Design when the machine needs you once.
It runs dependably forever.”

We call this Expertise as Code. Your knowledge of which model earns each step, where you must approve, and what must never be skipped — versioned in git, executable by anyone, improving with every run.

Architect

Senior / Staff Engineer

Describes the workflow: where to pause, which model, what must never be skipped
Asks Merlin to generate a mission — Merlin researches the codebase, proposes a plan, writes the TypeScript
Reviews the generated mission like a PR, optionally edits, commits to .merlin/missions/
Fires missions and works on other things while they run
Defines where humans must approve

Any Engineer

Junior, Mid, Senior — Everyone Else

Picks a mission, provides inputs
Gets the senior engineer’s checkpoint placement without needing to know why
Runs missions from phone, tablet, or desktop — approves at checkpoints, picks up results
Monitors progress in the Control Room from any device
Approves or edits the plan at defined checkpoints
Gets senior-quality output with guardrails baked in
No AI knowledge required

Mission vs Skill

Skills are easy to write. Missions are easy to trust.

A skill is Markdown: quick to write, easy to tweak, weak on control. A mission is the inverse: more deliberate to author, but backed by full TypeScript and exact execution rules.

Easy authoring, weak control

Skill

Written in Markdown
Fast to write and easy to tweak
Good for reusable guidance or one focused capability
Lives inside one agent context
Depends on the agent’s interpretation in the moment
Cannot define checkpoints, transfer rules, or stop conditions exactly
Cannot express real control flow, loops, or integration logic as code
Cannot drive a parsed execution map or dynamic process UI

More effort, explicit control

Mission

Written as executable TypeScript workflow
More deliberate to author and review
Defines sequence, approvals, validation, and stop conditions exactly
Controls multiple contexts and what transfers between them
Places the human in the loop deliberately, with custom React interactions
Uses real TypeScript control flow, loops, branching, and tool integrations
Can be parsed from its AST into execution maps and dynamic UI
Makes judgment repeatable instead of mood-dependent

A skill tells one agent something useful. A mission tells Merlin exactly how the whole job must run.

Skills optimize for ease. Missions optimize for control and reliability.

Complexity that fits the person

Level 1

Guided

Pick a mission. Provide inputs. Wait for results. Approve the plan when it appears. Decide whether to commit when it’s done. No AI knowledge required.

Level 2

Informed

Monitor execution in real time. Make decisions at branch points. Override model selection per step. Inspect every agent decision before it becomes code.

Level 3

Architect

Design the workflow: research strategies, validation criteria, approval checkpoints, model per step. Ask Merlin to generate the TypeScript. Review it, optionally edit it, commit it.

Most of your team runs at Level 1–2.
Your top engineers work at Level 3.
Everyone ships Level 3 quality.

Mission Code

Your process, as TypeScript.

A mission is the same workflow every time: research first, approve the plan, execute only what was approved, validate the result, then stop for a final human call.

Five checkpoints. Two human approvals. One consistent standard for how work runs.

Step 1
Research

Read only. Gather context.
Step 2
Plan + approve

Draft the fix. Wait for approval.
Step 3
Execute

Ship only the approved plan.
Step 4
Validate

Run tests. Review the diff.
Step 5
Commit decision

Human decides what ships.

// Research with read-only tools
const research = await agent("Analyze related patterns in the codebase", {
  tools: ["glob", "grep", "read"],
  model: "gemini-flash",
  provider: "google"
});

// Draft a plan, then stop for approval
const plan = await agent(
  "Propose a fix using this research: " + research
);
const approved = await prompt("ApprovalForm", {
  title: "Review the proposed plan",
  plan: plan,
  options: ["Approve", "Approve with changes", "Reject"]
});

// Execute only the approved work
await agent("Implement the approved plan: " + approved);

// Validate with tests and a regression review
await bash("npm run test");
const review = await agent("Review the diff for regressions");

// Require a human decision before commit
await prompt("ResultsView", {
  title: "Review changes before committing",
  diff: review,
  actions: ["Commit", "Request changes", "Discard"]
});

Five steps. Two human checkpoints. One mission file that enforces the same engineering standard every run.

The Platform

Everything your missions need to run, monitor, and improve.

Control Room

The Switchboard shows every mission across every project, sorted by urgency — input needed, error, running, done. One view, full triage. Live metro map of each run. Get notified when a checkpoint needs you. Nothing to watch until then.

Full audit trail

Multi-Model

Use GPT-5.4-mini or Gemini 2.5 Flash-Lite for research, GPT-5.4 or Gemini 2.5 Pro for planning, Claude Opus 4.7 where rigor matters, then Qwen3 or Gemma 4 locally for focused execution. Mix Anthropic, OpenAI, Gemini, and local models in one mission.

Per-step budget controls

Structural Guardrails

Can’t skip research. Can’t bypass approval. Can’t commit without review. V8 isolate crash isolation.

Architecture-enforced constraints

Works Everywhere

Fire a mission from your laptop. Approve the plan from your phone on the train. The machine keeps running. Pick up results on any device.

Cross-timezone handoff

Round Table

For your highest-stakes steps: 3 providers vote, majority wins. The architect places this at the nodes where per-step accuracy matters most.

Multi-provider validation

Knowledge Compounds

Every run audited. Engineers rate missions. Teams see what works. Missions improve with use.

Institutional knowledge retention

Why Now

AI agents reached capability. Orchestration didn’t keep up. 13% fewer junior engineers are entering the field — the apprenticeship model is breaking. “Vibe coding” proved that speed without discipline produces debt, not software. MCP standardized tool integration. The window to encode expertise is now. Sundeep Teki, Stack Overflow 2026

Knowledge Compounds

Every run that completes sharpens the expertise. Engineers rate missions. Teams see which checkpoint placements hold under real conditions. Missions improve. Your best engineers’ judgment doesn’t retire when they move on — it compounds in git, running for everyone who comes after.

Every engineer ships at the level of your best engineer.

Their expertise — encoded once, running dependably forever.

Star on GitHub

Code was never the hard part.Engineering is.

What Is a Mission?

Skills are easy to write. Missions are easy to trust.

Complexity that fits the person

Guided

Informed

Architect

Your process, as TypeScript.

The Platform

Control Room

Multi-Model

Structural Guardrails

Works Everywhere

Round Table

Knowledge Compounds

Knowledge Compounds

Every engineer ships at the level of your best engineer.

Code was never the hard part.
Engineering is.