This one change made my debugging sessions dramatically better.
Here's what I used to do: paste the buggy code, say "fix this," and hope for the best. Sometimes it worked. More often I got a patch that looked right but didn't actually address the root cause or worse, introduced something new.
The shift that changed everything:
Write a failing test first. Fix second.
The workflow that actually works:
- Give Claude the full picture logs, relevant code, what you expected vs. what happened
- Ask it to write a test that reproduces the bug
- Run the test. If it passes, Claude doesn't understand the bug yet go back to step 1
- Once it fails, ask for 2–3 different fix approaches
- Let the tests pick the winner
Real scenario: Payment double-charge bug
Users were occasionally getting charged twice for the same transaction. Support tickets were piling up. The instinct? "Claude, fix the double charge bug."
Bad move.
Step 1: Give full context:
"Here's our payment processor webhook handler and the Stripe logs. Some users are seeing duplicate charges.
Expected: one charge per order.
Actual: sometimes two charges appear 2–3 seconds apart with the same order ID."
Step 2: Ask for a failing test, not a fix:
"Write a test that reproduces this. Simulate two webhook events arriving simultaneously for the same order. Don't fix anything yet."
Claude wrote this:
@Test
void testDuplicateWebhookDoesNotDoubleCharge() throws InterruptedException {
String orderId = "order_123";
// Simulate two payment.success webhooks arriving at the same time
ExecutorService executor = Executors.newFixedThreadPool(2);
CountDownLatch latch = new CountDownLatch(1);
Callable<Void> webhookTask = () -> {
latch.await(); // both threads wait, then fire simultaneously
paymentWebhookHandler.process(orderId);
return null;
};
List<Future<Void>> futures = executor.invokeAll(
List.of(webhookTask, webhookTask)
);
latch.countDown(); // release both at the same time
for (Future<Void> f : futures) f.get(); // wait for completion
List<Charge> charges = chargeRepository.findByOrderId(orderId);
assertEquals(1, charges.size()); // ← This failed. Returned 2.
}Test failed. Bug confirmed. Claude actually understood the problem.
Step 3: Now ask for fixes:
"Good. Now give me 2–3 approaches to make this test pass."
Claude returned three options:
Option A: Database-level unique constraint + idempotency key (most robust):
// In PaymentWebhookHandler.java
@Transactional
public void process(String orderId) {
if (chargeRepository.existsByOrderId(orderId)) {
log.info("Duplicate webhook ignored for order: {}", orderId);
return;
}
// Schema has UNIQUE constraint on order_id column
Charge charge = new Charge(orderId, amount);
chargeRepository.save(charge); // throws on duplicate, transaction rolls back
}javaOption B: Redis distributed lock with TTL (fast, low DB pressure):
@Autowired
private RedissonClient redisson;
public void process(String orderId) {
RLock lock = redisson.getLock("payment-lock:" + orderId);
try {
if (lock.tryLock(0, 10, TimeUnit.SECONDS)) {
if (!chargeRepository.existsByOrderId(orderId)) {
chargeRepository.save(new Charge(orderId, amount));
}
}
} finally {
if (lock.isHeldByCurrentThread()) lock.unlock();
}
}Option C: Webhook deduplication table (simplest, no extra infra):
@Transactional
public void process(String orderId) {
boolean inserted = webhookLogRepository.insertIfAbsent(orderId);
// INSERT INTO webhook_log (order_id) VALUES (?) ON CONFLICT DO NOTHING
if (!inserted) {
log.warn("Webhook already processed for order: {}", orderId);
return;
}
chargeRepository.save(new Charge(orderId, amount));
}Tests ran against all three. Option A passed cleanly and handled edge cases the others missed the database constraint acted as the final safety net even if application logic had a gap.
What "fix the bug" would have given: probably a simple if (charged) return; check which breaks the moment two threads hit it at the same millisecond.
What the test-first approach gave: a race condition fix with proof it works, plus a regression test that lives in the codebase forever.
When you frame it this way, Claude stops being an autocomplete for fixes and starts acting like a proper debugging partner one that has to prove it understands the problem before touching anything.
Less guesswork. Better fixes. And you walk away with regression coverage that catches it if it ever comes back.
AI shouldn't guess fixes. It should prove them.