May 14, 2026
The Bug Investigation Workflow That Changed How I Think About Debugging
Most debugging workflows have a hidden problem: your local environment is lying to you.
Sonu Yadav
5 min read
You have packages installed that aren't in the repo. Environment variables set from a project three months ago. Lock files in weird states. A .env you modified and forgot about. When you reproduce a bug locally, you're not reproducing it in the clean state your users and CI are hitting — you're reproducing it in your personal development mess.
There's a better way. Here's the workflow.
The core idea: ephemeral sandboxes for every bug
Instead of investigating bugs in your local environment, you spin up a fresh ephemeral sandbox — a crabbox — that starts from a known clean state. The bug gets reproduced in that sandbox. The fix gets applied and verified in that sandbox. Your local machine never gets touched.
Traditional debugging:
Bug reported → reproduce locally → local env is polluted →
"works on my machine" → fix it locally → push → CI fails →
debug why CI fails when it worked locally
Crabbox debugging:
Bug reported → spin up clean sandbox → reproduce in sandbox →
fix in sandbox → verify fix in sandbox → confidence is real →
commit and pushTraditional debugging:
Bug reported → reproduce locally → local env is polluted →
"works on my machine" → fix it locally → push → CI fails →
debug why CI fails when it worked locally
Crabbox debugging:
Bug reported → spin up clean sandbox → reproduce in sandbox →
fix in sandbox → verify fix in sandbox → confidence is real →
commit and pushThe workflow from a real bug investigation — a backup volatile filter issue that was causing Cargo.lock to be missing from backups while including live files that should have been skipped:
# Bug E2E: reproduce the exact failure
# Crabbox: tbx_01kr5xt9vf5pas5ee4aefrp3am
# Result: Cargo.lock missing, 3 live volatile files included ← confirms the bug
# Fix E2E: verify the fix works
# Crabbox: tbx_01kr5y3e1kbtt6chbypfdydbgs
# Result: included_workspace=2 omitted_live_files=4 skippedVolatileCount=4 json=parseable# Bug E2E: reproduce the exact failure
# Crabbox: tbx_01kr5xt9vf5pas5ee4aefrp3am
# Result: Cargo.lock missing, 3 live volatile files included ← confirms the bug
# Fix E2E: verify the fix works
# Crabbox: tbx_01kr5y3e1kbtt6chbypfdydbgs
# Result: included_workspace=2 omitted_live_files=4 skippedVolatileCount=4 json=parseableTwo sandboxes. Two IDs. Reproducible, auditable, and completely isolated from the local state.
Why this works with 10 parallel sessions
The other half of the workflow: you don't run one session at a time. You run ten.
Session 1: Investigating authentication bug
Session 2: Reproducing the backup issue
Session 3: Testing a hypothesis about the race condition
Session 4: Verifying the proposed fix for session 2
...
Session 10: Running regression tests for the week's fixesSession 1: Investigating authentication bug
Session 2: Reproducing the backup issue
Session 3: Testing a hypothesis about the race condition
Session 4: Verifying the proposed fix for session 2
...
Session 10: Running regression tests for the week's fixesEach session is its own clean environment. They can't pollute each other. There's no "wait for CI" blocking your local investigation. You're doing parallel debugging at a scale that's genuinely not possible when you're working in a single local environment.
The tool making this possible is crabbox.sh — ephemeral sandboxes designed specifically for this workflow.
A bug, reproduced and fixed
Here's the actual bug that sparked this article. The backup system was doing two wrong things:
Wrong behavior 1: Cargo.lock was missing from backups
(a lock file that SHOULD be backed up was being skipped)
Wrong behavior 2: Live volatile files were being included
agents/<agentId>/sessions/*.jsonl
cron/runs/*.jsonl
session-delivery-queue/*.json
(files that SHOULD be skipped were getting backed up)Wrong behavior 1: Cargo.lock was missing from backups
(a lock file that SHOULD be backed up was being skipped)
Wrong behavior 2: Live volatile files were being included
agents/<agentId>/sessions/*.jsonl
cron/runs/*.jsonl
session-delivery-queue/*.json
(files that SHOULD be skipped were getting backed up)The root cause: the volatile-skipping logic was too broad. It was matching *.lock and tmp patterns across the entire workspace instead of scoping the rule correctly. Cargo.lock got caught by the lock file rule when it shouldn't have been. The live session files weren't being caught because the paths weren't registered.
The fix touched four specific things:
// Before: too broad, catches Cargo.lock incorrectly
const isVolatile = (path: string) =>
path.endsWith('.lock') || path.includes('/tmp/');
// After: scoped to state-owned live files only
const VOLATILE_LIVE_PATHS = [
/agents\/[^/]+\/sessions\/.*\.jsonl$/,
/cron\/runs\/.*\.jsonl$/,
/session-delivery-queue\/.*\.json$/,
];
const isVolatile = (path: string, isStateOwned: boolean) => {
if (!isStateOwned) return false; // only skip state-owned files
return VOLATILE_LIVE_PATHS.some(pattern => pattern.test(path));
};// Before: too broad, catches Cargo.lock incorrectly
const isVolatile = (path: string) =>
path.endsWith('.lock') || path.includes('/tmp/');
// After: scoped to state-owned live files only
const VOLATILE_LIVE_PATHS = [
/agents\/[^/]+\/sessions\/.*\.jsonl$/,
/cron\/runs\/.*\.jsonl$/,
/session-delivery-queue\/.*\.json$/,
];
const isVolatile = (path: string, isStateOwned: boolean) => {
if (!isStateOwned) return false; // only skip state-owned files
return VOLATILE_LIVE_PATHS.some(pattern => pattern.test(path));
};The distinction between "state-owned live files" and "arbitrary workspace lock files" is the entire fix. Cargo.lock is a workspace file, not state-owned. It should be in backups. agents/abc123/sessions/current.jsonl is a live session file that changes constantly and doesn't belong in backups.
Verifying in the bug sandbox
Before writing a single line of fix code, verify the bug exists in a clean environment:
# Spin up a fresh crabbox
crabbox create tbx_bug_investigation
# In the sandbox, run the backup command
backup create --json
# Check the output
# Expected (broken): Cargo.lock missing, live files included
# Actual: matches expected — bug confirmed# Spin up a fresh crabbox
crabbox create tbx_bug_investigation
# In the sandbox, run the backup command
backup create --json
# Check the output
# Expected (broken): Cargo.lock missing, live files included
# Actual: matches expected — bug confirmedThe sandbox output that confirmed the bug:
included_workspace: [... list without Cargo.lock ...]
included_live_files: [
"agents/abc123/sessions/current.jsonl", ← should be skipped
"cron/runs/2026-05-14.jsonl", ← should be skipped
"session-delivery-queue/pending.json" ← should be skipped
]included_workspace: [... list without Cargo.lock ...]
included_live_files: [
"agents/abc123/sessions/current.jsonl", ← should be skipped
"cron/runs/2026-05-14.jsonl", ← should be skipped
"session-delivery-queue/pending.json" ← should be skipped
]Bug confirmed in a clean environment. Now you can fix it with confidence that you're fixing the real issue, not a symptom of your local state.
Verifying in the fix sandbox
After applying the fix, spin up a second fresh sandbox:
# New crabbox with the fix applied
crabbox create tbx_fix_verification
# Run the same backup command
backup create --json
# Check the output# New crabbox with the fix applied
crabbox create tbx_fix_verification
# Run the same backup command
backup create --json
# Check the outputThe output that confirmed the fix worked:
included_workspace=2 ← Cargo.lock now included
omitted_live_files=4 ← live files correctly excluded
skippedVolatileCount=4 ← volatile filter working
json=parseable ← no JSON pollution from skip logsincluded_workspace=2 ← Cargo.lock now included
omitted_live_files=4 ← live files correctly excluded
skippedVolatileCount=4 ← volatile filter working
json=parseable ← no JSON pollution from skip logsincluded_workspace=2 confirms Cargo.lock is back. omitted_live_files=4 confirms the live session files are being skipped. skippedVolatileCount=4 confirms the volatile filter is applying correctly. json=parseable catches a secondary bug that was fixed alongside the main one — the --json output was being polluted with verbose skip logs.
The local tests that run after sandbox verification
Sandbox verification proves the fix works end-to-end. Unit tests prove it works in detail and will keep working:
pnpm test \
src/infra/backup-volatile-filter.test.ts \
src/infra/backup-create.test.ts \
src/commands/backup.test.tspnpm test \
src/infra/backup-volatile-filter.test.ts \
src/infra/backup-create.test.ts \
src/commands/backup.test.tsThree test files covering the specific components touched:
backup-volatile-filter.test.ts → tests the isVolatile logic
includes: Cargo.lock not skipped
includes: *.lock workspace files not skipped
includes: state-owned live files are skipped
backup-create.test.ts → tests the backup creation end-to-end
includes: json output is clean (no log pollution)
includes: volatile counts are accurate
backup.test.ts → integration tests for the backup command
includes: regression tests for both bugsbackup-volatile-filter.test.ts → tests the isVolatile logic
includes: Cargo.lock not skipped
includes: *.lock workspace files not skipped
includes: state-owned live files are skipped
backup-create.test.ts → tests the backup creation end-to-end
includes: json output is clean (no log pollution)
includes: volatile counts are accurate
backup.test.ts → integration tests for the backup command
includes: regression tests for both bugsThen the sanity checks:
# No whitespace issues, no merge artifacts
git diff --check
# Format check
oxfmt --check# No whitespace issues, no merge artifacts
git diff --check
# Format check
oxfmt --checkThe PR structure that comes out of this workflow
The investigation produces a PR that tells the whole story:
Branch: pr-72251-fix
What changed:
- Scoped volatile skipping to state-owned live files only
(not arbitrary workspace *.lock/tmp files)
- Added current live paths to volatile skip list:
agents/<agentId>/sessions/*.jsonl
cron/runs/*.jsonl
session-delivery-queue/*.json
- Fixed backup create --json output pollution from skip logs
- Added regression tests
- Updated docs and changelog
Proof:
Bug sandbox (tbx_01kr5xt9vf5pas5ee4aefrp3am):
- Cargo.lock missing ← confirmed
- 3 live volatile files included ← confirmed
Fix sandbox (tbx_01kr5y3e1kbtt6chbypfdydbgs):
- included_workspace=2
- omitted_live_files=4
- skippedVolatileCount=4
- json=parseable ← all confirmedBranch: pr-72251-fix
What changed:
- Scoped volatile skipping to state-owned live files only
(not arbitrary workspace *.lock/tmp files)
- Added current live paths to volatile skip list:
agents/<agentId>/sessions/*.jsonl
cron/runs/*.jsonl
session-delivery-queue/*.json
- Fixed backup create --json output pollution from skip logs
- Added regression tests
- Updated docs and changelog
Proof:
Bug sandbox (tbx_01kr5xt9vf5pas5ee4aefrp3am):
- Cargo.lock missing ← confirmed
- 3 live volatile files included ← confirmed
Fix sandbox (tbx_01kr5y3e1kbtt6chbypfdydbgs):
- included_workspace=2
- omitted_live_files=4
- skippedVolatileCount=4
- json=parseable ← all confirmedThe reviewer can see exactly what was broken, exactly what the fix produces, and can spin up the same sandboxes to verify independently. The IDs are permanent — tbx_01kr5xt9vf5pas5ee4aefrp3am will always reproduce that specific bug state.
Why this changes debugging
The traditional debugging loop has several compounding problems:
Problem 1: Local environment pollution
→ "it works on my machine" is a symptom
Problem 2: Sequential investigation
→ while testing hypothesis A, can't test hypothesis B
Problem 3: Non-reproducible state
→ "it worked yesterday" can't be verified
Problem 4: Fix confidence is local
→ CI still fails 30% of the time after "local fix"Problem 1: Local environment pollution
→ "it works on my machine" is a symptom
Problem 2: Sequential investigation
→ while testing hypothesis A, can't test hypothesis B
Problem 3: Non-reproducible state
→ "it worked yesterday" can't be verified
Problem 4: Fix confidence is local
→ CI still fails 30% of the time after "local fix"The crabbox workflow addresses all four:
Clean environment → no pollution
Parallel sessions → 10 hypotheses at once
Persistent sandbox IDs → reproducible forever
E2E sandbox verification → CI confidence before pushClean environment → no pollution
Parallel sessions → 10 hypotheses at once
Persistent sandbox IDs → reproducible forever
E2E sandbox verification → CI confidence before pushThe shift in mindset is: your local machine is for writing code. Sandboxes are for verifying behavior. Never mix the two.
Get started
# Install crabbox
# See: http://crabbox.sh
# For your next bug:
# 1. Spin up a bug sandbox, reproduce the issue
crabbox create tbx_bug
# 2. Apply your fix
# 3. Spin up a fix sandbox, verify the fix
crabbox create tbx_fix
# 4. Run your unit tests locally
# 5. Push with confidence# Install crabbox
# See: http://crabbox.sh
# For your next bug:
# 1. Spin up a bug sandbox, reproduce the issue
crabbox create tbx_bug
# 2. Apply your fix
# 3. Spin up a fix sandbox, verify the fix
crabbox create tbx_fix
# 4. Run your unit tests locally
# 5. Push with confidenceThe 10-parallel-session workflow sounds like overkill until you've used it. Then debugging from a single local terminal feels like working with one hand tied behind your back.
Your local environment is lying to you. Stop debugging in it.