Spring Boot Fault Tolerance and Resilience
- Service goes down — Run multiple instances of services running
- Microservice is slow; What if external service itself is slow!; Even 1 service is slow, that effects other independent service.
Even if the target service is fast, it becomes slow due to another slow service. Why: Due to threads — Threads in web server: Thread is created for each request; Threads consume all the resources. Hence slow. Solution: Timeout
1) Timeouts, 2) Circuit breaker pattern, 3) Bulkhead pattern
- Setting timeouts to RestTemplate (Any service that calls another service can have timeouts)
ClientHttpRequestFactory in Bean of RestTemplate
What if the requests are faster than threads removed due to timeout. Eventually we will run into the same issue.
2. Circuit Breaker Pattern: Instead of blindly sending request to the slow service, find the slow service and hold off on sending the requests and send request once in a while. It looks for fault and breaks the circuit. Unlike fuse, it can be brought back.
Where to setup circuit breaker? Setup circuit breaker in discovery server.
What to do when a circuit breaks? Before this question, lets answer another question.
When does a circuit trip? Last n consequent requests time out. Not so good because what if alternative requests timeout, the circuit is never gonna break!
Parameters: last n requests to consider, how many of n should fail for circuit to break, timeout duration
When does circuit un-trip? Parameters: How long to wait after a circuit trip to try again (Sleep window).
Lets come back to question “What to do when a circuit breaks and the request comes in? What to respond with?”
Fallback: 1. Throw Error, 2. Default response, 3. Send the cached responses
Why circuit breakers? Failing fast, Fallback functionality, Automatic recovery
Annotations: @EnableCircuitBreaker to application class. Add @HystrixCommand to methods that need circuit breakers. Configure Hystrix behaviour ie. configure parameters and fallback mechanism
Create a method called getFallbackCatalog() and return default response.
How does hystrix work? Proxy class is a wrapper around API class that contains circuit breaker logic.
Problem with hystrix proxy: We are returning fallback’s hardcoded response even if one api returned response and other api didn’t.
Solution: Adding more granularity to fallback mechanism, ie. a part of info say it is calling 2 apis one of them returned data, one failed, then that valid data returned by first api must be returned.
1 method, 1 fallback; So take out different api calls to different methods and add Hystrix command and fallbacks to both the methods. Main API method doesn’t need the hystrix command or fallback anymore. Is it it?
Now, fallback doesn’t get picked up at all. Why? Because of the proxy class. Proxy class is a wrapper around the API class.
Solution continues: Taking that method out to a bean class. API method call not another method in the same class but a method from another instance and Autowiring. Since it is another instance, hystrix proxy is gonna intercept and do the fallback. So create 2 additional classes along with fallbacks and autowire in this class.
Hystrix Config Parameters:
Hystrix Dashboard: Add dependency; @EnableHystrixDashboard; application.properties: enpoint where hystrix dashboard must stream.
3. Bulkhead Pattern: Isolate thread pools of microservices so that one doesn’t cause problem for others.