If you've been exploring AI agents recently, you might have encountered the same obstacle I faced: generating output from agents is simple. However, enabling them to execute entire processes independently without human input is the real challenge.

I spent two weeks building VoxYZ Agent World, a system in which 6 AI agents autonomously run a website from within a pixel-art office. This isn't about prompt engineering or clever workflows. It's about the unglamorous infrastructure work that bridges "my agents can respond" and "my agents can execute."

Here's what I learned, what broke, and how to fix it.

None

The Starting Point: You Have OpenClaw, Now What?

Let me start with what I already had working.

OpenClaw is the brain; it enables Claude to use tools, browse the web, manage files, and run scheduled tasks. You can assign cron jobs to agents for daily tweets, hourly intelligence scans, or periodic research reports. That's powerful.

The tech stack is deliberately simple:

  • OpenClaw (on a VPS):- The agents' brain for running roundtable discussions, cron jobs, and deep research
  • Next.js + Vercel:- Website frontend plus an API layer
  • Supabase:- Single source of truth for all state (proposals, missions, events, memories)

The agents have distinct roles:

  • Minion makes decisions
  • Sage analyzes strategy
  • Scout gathers intelligence
  • Quill writes content
  • Xalt manages social media
  • Observer does quality checks

OpenClaw's cron jobs ensure they "show up for work" every day. The roundtable feature lets them discuss, vote, and reach consensus.

The issue is that all the agents' outputs, such as drafted tweets, analysis reports, and content pieces, remain in OpenClaw's output layer. There is no mechanism to convert these outputs into real actions. Additionally, there is no signal indicating "done" after the execution finishes.

Between "agents can produce output" and "agents can run things end-to-end," there's a full execute -feedback -re-trigger loop missing.

That's what this article is about.

What a Closed Loop Actually Looks Like

Before discussing pitfalls, let's clarify what "closed loop" means to ensure we're focusing on the right concept.

A truly unattended agent system needs this cycle running continuously:

1. Agent proposes an idea (Proposal)
     ↓
2. Auto-approval check (Auto-Approve)
     ↓
3. Create mission + steps (Mission + Steps)
     ↓
4. Worker claims and executes (Worker)
     ↓
5. Emit event (Event)
     ↓
6. Trigger new reactions (Trigger / Reaction)
     ↓
7. Back to step one

Pitfall 1: Two Places Fighting Over Work

The problem: My VPS had OpenClaw workers claiming and executing tasks. At the exact same time, Vercel had a heartbeat cron running mission-worker, also trying to claim the same tasks.

Both were querying the same database table, grabbing the same execution step, and executing independently. No coordination. Pure race condition.

Occasionally, a step would get tagged with conflicting statuses by both sides. The system would grind to a halt, confused about what was actually done.

The fix: Cut one executor completely.

I made the VPS the sole executor. Vercel now runs only the lightweight control plane—evaluating triggers, processing the reaction queue, and cleaning up stuck tasks.

The change was minimal. I removed the runMissionWorker call from the heartbeat route:

// Heartbeat now does only 4 things:
const triggerResult = await evaluateTriggers(sb, 4_000);
const reactionResult = await processReactionQueue(sb, 3_000);
const learningResult = await promoteInsights(sb);
const staleResult = await recoverStaleSteps(sb);

Bonus benefit: This saved on Vercel Pro costs. The heartbeat doesn't need Vercel's cron anymore — one line in a crontab on the VPS does the job:

*/5 * * * * curl -s -H "Authorization: Bearer $KEY" https://yoursite.com/api/ops/heartbeat

Every 5 minutes, like clockwork.

Pitfall 2: Triggered But Nobody Picked It Up

The problem: I wrote 4 trigger rules:

  • Auto-analyze when a tweet goes viral
  • Auto-diagnose when a mission fails
  • Auto-review when content gets published
  • Auto-promote when an insight matures

During testing, I observed that the triggers correctly detected conditions and generated proposals. But those proposals sat forever at pending status — they never became missions, never generated executable steps.

Why? Triggers were directly inserted into the ops_mission_proposals table, but the normal approval flow is:

insert proposal → evaluate auto-approve → if approved, create mission + steps

Triggers were skipping the last two steps entirely.

The fix: Extract a shared function — createProposalAndMaybeAutoApprove. Every path that creates a proposal (API, triggers, reactions) must call this one function.

Here's the structure:

// proposal-service.ts — the single entry point for all proposal creation
export async function createProposalAndMaybeAutoApprove(
  sb: SupabaseClient,
  input: ProposalServiceInput,  // includes source: 'api' | 'trigger' | 'reaction'
): Promise<ProposalServiceResult> {
  // 1. Check daily limit
  // 2. Check Cap Gates (explained below)
  // 3. Insert proposal
  // 4. Emit event
  // 5. Evaluate auto-approve
  // 6. If approved → create mission + steps
  // 7. Return result
}

After this change, triggers just return a proposal template. The evaluator calls the service:

// trigger-evaluator.ts
if (outcome.fired && outcome.proposal) {
  await createProposalAndMaybeAutoApprove(sb, {
    ...outcome.proposal,
    source: 'trigger',
  });
}

One function to rule them all. Any future check logic, rate limiting, blocklists, new caps, you change one file.

Pitfall 3: Queue Keeps Growing When Quota Is Full

This was the sneakiest bug. Everything looked fine on the surface. No errors in logs. But the database had an increasing number of queued steps piling up every day.

The reason: The tweet quota was full, but proposals were still being approved, generating missions and generating queued steps. The VPS worker saw the quota was full and just skipped execution — didn't claim, didn't mark as failed. The next day, another batch arrived.

The queue kept growing.

The fix: Cap Gates — reject at the proposal entry point. Don't let it generate queued steps in the first place.

Here's how it works:

// The gate system inside proposal-service.ts
const STEP_KIND_GATES: Record<string, StepKindGate> = {
  write_content: checkWriteContentGate,  // Check daily content cap
  post_tweet:    checkPostTweetGate,     // Check tweet quota
  deploy:        checkDeployGate,        // Check deploy policy
};

Each step kind has its own gate. If the tweet quota is full, the proposal gets rejected immediately with a clear reason. A warning event gets emitted. No queued step means no buildup.

Here's the post_tweet gate in detail:

async function checkPostTweetGate(sb: SupabaseClient) {
  const autopost = await getOpsPolicyJson(sb, 'x_autopost', {});
  if (autopost.enabled === false) {
    return { ok: false, reason: 'x_autopost disabled' };
  }

const quota = await getOpsPolicyJson(sb, 'x_daily_quota', {});
  const limit = Number(quota.limit ?? 10);
  
  const { count } = await sb
    .from('ops_tweet_drafts')
    .select('id', { count: 'exact', head: true })
    .eq('status', 'posted')
    .gte('posted_at', startOfTodayUtcIso());
  if ((count ?? 0) >= limit) {
    return { 
      ok: false, 
      reason: `Daily tweet quota reached (${count}/${limit})` 
    };
  }
  
  return { ok: true };
}

Key principle: Reject at the gate, don't pile up in the queue. Rejected proposals get recorded for auditing. They don't silently disappear.

Making It Alive: Triggers + Reaction Matrix

With those three pitfalls fixed, the loop works. But the system is just an "error-free assembly line", not yet a "responsive team."

Here's how I made it feel alive.

Triggers

I built 4 trigger rules. Each detects a specific condition and returns a proposal template:

| Trigger Condition             | Action                           | Cooldown / Timeframe |
| ----------------------------- | -------------------------------- | -------------------- |
| Tweet engagement > 5% growth  | Analyzes why it went viral       | 2 hours              |
| Mission failed                | Sage diagnoses root cause        | 1 hour               |
| New content published         | Observer reviews quality         | 2 hours              |
| Insight gets multiple upvotes | Auto-promote to permanent memory | 4 hours              |

Triggers only detect; they don't touch the database directly. They hand proposal templates to the proposal service. All cap gates and auto-approve logic apply automatically.

Cooldown matters. Without it, one viral tweet would trigger an analysis on every heartbeat cycle (every 5 minutes). That would overwhelm the system.

Reaction Matrix

This is the most interesting part — spontaneous inter-agent interaction.

There's a reaction_matrix stored in the ops_policy table:

{
  "patterns": [
    { 
      "source": "twitter-alt", 
      "tags": ["tweet","posted"], 
      "target": "growth",
      "type": "analyze", 
      "probability": 0.3, 
      "cooldown": 120 
    },
    { 
      "source": "*", 
      "tags": ["mission:failed"], 
      "target": "brain",
      "type": "diagnose", 
      "probability": 1.0, 
      "cooldown": 60 
    }
  ]
}

Translation:

  • When Xalt posts a tweet:- 30% chance Growth will analyze its performance
  • When any mission fails:- 100% chance Sage will diagnose what went wrong

Probability isn't a bug, it's a feature. 100% determinism makes the system feel robotic. Add randomness, and it feels more like a real team — sometimes someone responds, sometimes they don't.

Self-Healing: Systems Will Get Stuck

VPS restarts. Network blips. API timeouts. Steps get stuck in running status with nobody actually processing them.

The heartbeat includes recoverStaleSteps to handle this:

// 30 minutes with no progress → mark failed → check if mission should be finalized
const STALE_THRESHOLD_MS = 30 * 60 * 1000;

const { data: stale } = await sb
  .from('ops_mission_steps')
  .select('id, mission_id')
  .eq('status', 'running')
  .lt('reserved_at', staleThreshold);
for (const step of stale) {
  await sb
    .from('ops_mission_steps')
    .update({
      status: 'failed',
      last_error: 'Stale: no progress for 30 minutes',
    })
    .eq('id', step.id);
    
  await maybeFinalizeMissionIfDone(sb, step.mission_id);
}

maybeFinalizeMissionIfDone checks all steps in the mission:

  • Any failed -whole mission fails
  • All completed -mission succeeds

No more "one step succeeded, so the whole mission gets marked as success."

The Full Architecture

Three layers with clear responsibilities:

OpenClaw (VPS): Think + Execute The brain and the hands. Runs roundtable discussions, executes all mission steps, and performs deep research.

Vercel: Approve + Monitor The control plane. Evaluates triggers, processes reactions, promotes insights, and recovers stale tasks.

Supabase: All State The shared cortex. Every piece of state — proposals, missions, events, memories — lives here.

What You Can Take Home

If you have OpenClaw + Vercel + Supabase, here's a minimum viable closed-loop checklist:

1. Database Tables (Supabase)

You need at least these:

| Name                    | Purpose                                                      |
| ----------------------- | ---------------------------------------------------------------- |
| `ops_mission_proposals` | Store proposals (pending / accepted / rejected)                  |
| `ops_missions`          | Store missions (approved / running / succeeded / failed)         |
| `ops_mission_steps`     | Store execution steps (queued / running / succeeded / failed)    |
| `ops_agent_events`      | Store event stream (all agent actions)                           |
| `ops_policy`            | Store policies (e.g., auto_approve, x_daily_quota, etc. as JSON) |
| `ops_trigger_rules`     | Store trigger rules                                              |
| `ops_agent_reactions`   | Store reaction queue                                             |
| `ops_action_runs`       | Store execution logs                                             |

2. Proposal Service (One File)

Put proposal creation + cap gates + auto-approve + mission creation in one function. All sources (API, triggers, reactions) call it.

This is the hub of the entire loop.

3. Policy-Driven Configuration (ops_policy table)

Don't hardcode limits. Every behavior toggle lives in the ops_policy table:

// auto_approve: which step kinds are allowed to auto-pass
{ 
  "enabled": true, 
  "allowed_step_kinds": ["draft_tweet","crawl","analyze","write_content"] 
}

// x_daily_quota: daily tweet cap
{ "limit": 8 }
// worker_policy: whether Vercel executes steps (set false = VPS only)
{ "enabled": false }

Adjust policies anytime without redeploying code.

4. Heartbeat (One API Route + One Crontab Line)

Create a /api/ops/heartbeat route on Vercel. Add a crontab entry on your VPS to run it every 5 minutes.

Inside it runs:

  • Trigger evaluation
  • Reaction queue processing
  • Insight promotion
  • Stale task cleanup

5. VPS Worker Contract

Each step kind maps to a worker. After completing a step, the worker calls maybeFinalizeMissionIfDone to check whether the entire mission should be finalized.

Never mark a mission as succeeded just because one step finished.

Two-Week Timeline

Here's how long each phase actually took:

| Phase                   | Time Taken   | What Got Done                                             |
| ----------------------- | ------------ | --------------------------------------------------------- |
| Infrastructure          | Pre-existing | OpenClaw VPS + Vercel + Supabase (already set up)         |
| Proposals + Approval    | 3 days       | Proposals API + auto-approve + policy table               |
| Execution Engine        | 2 days       | mission-worker + 8 step executors                         |
| Triggers + Reactions    | 2 days       | 4 trigger types + reaction matrix                         |
| Loop Unification        | 1 day        | proposal-service + cap gates + fix three pitfalls         |
| Affect System + Visuals | 2 days       | Affect rewrite + idle behavior + pixel office integration |
| Seed + Go Live          | Half day     | Migrations + seed policies + crontab                      |

Excluding pre-existing infrastructure, the core closed loop (propose - execute - feedback - retrigger) took about one week to wire up.

Final Thoughts

These 6 agents now autonomously operate voxyz.space every day. I'm still optimising the system daily, tuning policies, expanding trigger rules, and improving how agents collaborate.

It's far from perfect. Inter-agent collaboration is still basic. "Free will" is mostly simulated through probability-based non-determinism. But the system genuinely runs. It genuinely doesn't need someone watching it 24/7.