When we access external resources from our code, especially if there’s a network connection in the middle, it’s quite probable that nasty things can happen. More than 20 years ago, Peter Deutsh and James Gosling described what they called the 8 fallacies of the distributed systems. Although the 8 are important, for this article the first one is the more relevant: network is not reliable. You can read more about this here.
What that mean is that we can have errors in our system which are not caused because the system we call is malfunctioning, but because the connection between them doesn’t work properly. This can cause big problems to our systems if we don’t take care of it correctly.
Polly
And how we can do this if we’re .NET developers? We’re going to rely on a great library called Polly. According to them, Polly is “a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner. Polly targets .NET 4.0, .NET 4.5 and .NET Standard 1.1.”
This means that Polly will help us to manage what we define as a transient error. In its simpler version, what we’re going to do is to create the definition of a policy and then execute an action in the context of that policy. Let’s take a look to some of those policies so we can see how this library works.
Retries
With this policy, we´re telling Polly that when it finds an error retries the action that it was trying to execute. The first thing we need to do is to create the retry policy:
var policy = Policy<A>
.Handle<RetryException>()
.WaitAndRetry(
3,
retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)),
(result, timeSpan, retryCount, context) =>
{
Console.WriteLine($"Retrying for {retryCount} time, on time {timeSpan} for exception {result.Exception.Message}");
}
);
As you can see, we´re telling Polly to wait a certain amount of time after each error and then try to do the action again.
To execute the action in the context of this policy, you need to do something like this:
var executionCount = 0;
var result = policy.Execute(() =>
{
executionCount++;
Console.WriteLine($"Executing action for {executionCount} time");
throw new RetryException($"Trowing for execution {executionCount}");
return 2;
});
In this case, we´re always forcing the error for demo purposes, but what we could have there is an HTTP call or a database query that has failed.
Timeout
With this policy, we’re making sure that who is making the call doesn’t have to wait more than a certain amount of time. This policy will be very useful when what we’re using doesn’t have a built-in timeout mechanism.
In this case, the policy definition would be something like:
var cancellationToken = new CancellationToken();
await policy.ExecuteAsync(async (ct) =>
{
Console.WriteLine($"Start Execution at {DateTime.UtcNow.ToLongTimeString()}");
await Task.Delay(10000, ct);
Console.WriteLine($"Finish Execution at {DateTime.UtcNow.ToLongTimeString()}");
}, cancellationToken);
If we execute this code, we’ll see that we’re not waiting more than the 2.5 seconds defined in the policy.
Cache
The cache policy allows us to return the results from a cache when they are available there. This will allow us to avoid making calls to the downstream system and will improve the overall performance.
In this case, the definition of the policy can be something like:
var policy = Policy
.Cache(
memoryCacheProvider,
new SlidingTtl(TimeSpan.FromMinutes(5)),
onCacheGet: (context, key) =>
{
Console.WriteLine($"Get {key} from cache");
},
onCachePut: (Context, key) =>
{
Console.WriteLine($"Put on {key}");
},
onCacheMiss: (context, key) =>
{
Console.WriteLine($"Miss {key}");
},
onCacheGetError: (context, key, exception) =>
{
Console.WriteLine($"Error getting {key}");
},
onCachePutError: (context, key, exception) =>
{
Console.WriteLine($"Error putting {key}");
});
And an example of use:
for (int i = 0; i < 10; i++) { var result = policy.ExecuteAndCapture(context => GetSomething(), new Context("KeyForSomething"));
Console.WriteLine($"Result is {result.Result}");
}
Fallback
This policy allows us to define a substitute value when the action fails. This will allow us to return a value that could be used even though the action is failing. A typical example of this is a movie recommendation service that, when fails, we return a list of default movies.
In this case, the policy would be something like:
return Policy<User>
.Handle<FallbackException>()
.Fallback(() => new User("defaultUser"));
And the usage something like:
var result = policy.Execute(() => throw new FallbackException());
Console.WriteLine($"Username is {result.Name}");
In this case we’ll see that, although the call will always fail, the result variable will be always informed.
Circuit Breaker
The Circuit Breaker pattern is a pattern that was popularized by Michael Nygard in his amazing book ReleaseIt. What the Circuit Breaker provides is a way to return an error quickly after the operation fails a certain number of times. Once we’re in this state, we will try to call the downstream service again after a certain time. If this call doesn’t fail, we’ll call the downstream service since then. If it fails, we’ll return an error and then we’ll try to call the downstream service after that interval of time.
In this case, we’d define the policy in this way:
Action<Exception, TimeSpan> onBreak = (exception, timespan) =>
{
Console.WriteLine($"Breaking because of {exception.Message} after {timespan.TotalSeconds} seconds");
};
Action onReset = () =>
{
Console.WriteLine("It's running again!");
};
var breakerPolicy = Policy
.Handle<BreakerException>()
.CircuitBreaker(1, TimeSpan.FromSeconds(10), onBreak, onReset);
And we could use it in this way:
for (int i = 0; i < 20; i++) { Console.WriteLine($"Lets call the downstream service at {DateTime.Now.ToLongTimeString()}"); try { breakerPolicy.Execute(ctx => SimulateCallToDownstreamService(ctx),
new Dictionary<string, object>() {{"id", i}});
} catch (Exception ex)
{
Console.WriteLine($"The downstream service threw an exception: {ex.Message}");
}
await Task.Delay(TimeSpan.FromSeconds(2));
}
Wrap policy
These policies are very interesting by themselves, but what gives even more power is to be able to combine them. An example of combination would be the following one, where we’re joining a fallback and a retry and then, if the call keeps failing after x retries, we return a fallback value.
var wrapPolicy = Policy.Wrap(
FallbackPolicy.GetGenericPolicy(),
RetryPolicy.GetPolicy<User>());
You can find the most used combinations here.
Summary
In this article we’ve seen an introduction to Polly, a library that provides us with resilience patterns. Polly will be a great travel partner when your applications become more complex.