In modern distributed systems, failures are not exceptions β they are expected behavior.
Yet many Java applications still implement retry logic using manual try/catch blocks, nested conditionals, and ad-hoc exception handling.
It works⦠until it doesn't.
If your Spring Boot service depends on an external API via Feign, and you're manually deciding which exceptions are retryable, it's time to level up.
This is where Resilience4j becomes essential.
The Real Problem: Retry Logic Scattered Across the Codebase
A common pattern looks like this:
try {
return feignClient.call();
} catch (FeignException ex) {
if (ex.status() == 503 || ex.status() == 504) {
// retry logic here
} else {
throw ex;
}
}Now multiply that across multiple services.
Problems:
- Retry rules are duplicated
- Business logic gets polluted
- No centralized configuration
- No metrics
- Hard to maintain
- Hard to evolve
Senior engineers will almost always push back on this approach β and for good reason.
What the Market Uses Today
The most common production stack in the Spring ecosystem:
- Spring Boot 3
- Spring Cloud OpenFeign
- Resilience4j
- Micrometer for metrics
This setup allows you to:
- Configure retry centrally
- Separate infrastructure from business logic
- Apply fallback cleanly
- Add observability automatically
Retry vs Fallback: Understanding the Difference
Before writing code, understand the responsibilities:
Retry
Used when failure is transient.
Examples:
- Timeout
- Connection reset
- 502, 503, 504
- 429 (with backoff)
Retry assumes the system might succeed if attempted again.
Fallback
Used when retry still fails β or when you want controlled degradation.
Examples:
- Return default classification
- Use cached data
- Send message to a queue
- Return partial response
Fallback is not retry.
Fallback is your safety net.
Step 1 β Add Resilience4j to Spring Boot 3
Dependency (Gradle example):
implementation 'io.github.resilience4j:resilience4j-spring-boot3'
implementation 'org.springframework.cloud:spring-cloud-starter-openfeign'gragrThis is the most common combination used in production systems.
Step 2 β Annotate Your Service with Retry
Instead of manual retry logic, do this:
import io.github.resilience4j.retry.annotation.Retry;
@Service
public class ClassificationService {
private final ExternalFeignClient feignClient;
public ClassificationService(ExternalFeignClient feignClient) {
this.feignClient = feignClient;
}
@Retry(name = "externalServiceRetry", fallbackMethod = "fallbackClassification")
public Classification classify(String id) {
ExternalResponse response = feignClient.getData(id);
return Classification.from(response);
}
private Classification fallbackClassification(String id, Throwable throwable) {
// Controlled degradation
return Classification.UNKNOWN;
}
}No try/catch.
No duplicated retry logic.
Business logic remains clean.
Step 3 β Define What Is Retryable (The Right Way)
In application.yml:
resilience4j:
retry:
instances:
externalServiceRetry:
max-attempts: 3
wait-duration: 500ms
enable-exponential-backoff: true
exponential-backoff-multiplier: 2
retry-exceptions:
- feign.RetryableException
- java.net.SocketTimeoutException
- java.io.IOException
ignore-exceptions:
- feign.FeignException$BadRequest
- feign.FeignException$Unauthorized
- feign.FeignException$Forbidden
- feign.FeignException$NotFoundThis is key.
You define retry behavior in configuration β not inside business methods.
Advanced: Retry Only for Specific HTTP Status Codes
Sometimes you need more control.
For example, retry only for:
- 429
- 502
- 503
- 504
Then use a custom RetryConfig:
@Bean
public RetryConfig retryConfig() {
return RetryConfig.custom()
.maxAttempts(3)
.waitDuration(Duration.ofMillis(400))
.retryOnException(ex -> {
if (ex instanceof FeignException fe) {
int status = fe.status();
return status == 429 ||
status == 502 ||
status == 503 ||
status == 504;
}
return ex instanceof IOException;
})
.build();
}This is much cleaner than spreading status checks everywhere.
Why This Is Better Than Manual Retry
Using Resilience4j gives you:
- Centralized Policy
Retry rules live in one place.
2. Metrics Out of the Box
You automatically get retry success/failure metrics.
3. Clean Business Code
Your service focuses only on logic.
4. Production-Ready Observability
Integrates easily with Micrometer, Prometheus, and Grafana.
5. Easy Evolution
Need to change retry from 3 attempts to 5? Just change the config.
Common Mistakes to Avoid
π« Retrying 4xx errors
Retrying invalid requests wastes resources.
π« No backoff strategy
Instant retries can overload a struggling service.
π« Using fallback to hide systemic problems
Fallback is controlled degradation, not silent failure.
π« Forgetting circuit breakers
Retry alone is not enough in unstable systems.
Final Thoughts
Manual retry logic might work in small projects.
But in real-world distributed systems, it becomes technical debt.
Resilience4j with Spring Boot 3 and OpenFeign gives you:
- Structured resilience
- Configurable retry behavior
- Safe fallbacks
- Production-grade patterns
Resilience is not about preventing failure.
It's about designing systems that behave predictably when failure happens.
And that's what separates junior implementations from production-ready architecture.