🙏🙏Emergency Prayer Circle

The Single Point of Failure: When Your Auth Service Becomes Your Achilles Heel

TL;DR: Auth middleware with no fallbacks creates a single point of failure - when the central service goes down, your entire app stops, even health checks.

#resilience#authentication#middleware#circuit-breaker#caching

The Code

csharp
1public class AuthenticationMiddleware
2{
3    private readonly RequestDelegate _next;
4    private readonly IAuthService _authService;
5    
6    public AuthenticationMiddleware(RequestDelegate next, IAuthService authService)
7    {
8        _next = next;
9        _authService = authService;
10    }
11    
12    public async Task InvokeAsync(HttpContext context)
13    {
14        var token = context.Request.Headers["Authorization"].FirstOrDefault();
15        
16        // 🤞 Fingers crossed the auth service is up!
17        var user = await _authService.ValidateTokenAsync(token);
18        
19        if (user == null)
20        {
21            context.Response.StatusCode = 401;
22            return;
23        }
24        
25        context.Items["User"] = user;
26        await _next(context);
27    }
28}
29
30// Applied globally in Program.cs
31app.UseMiddleware<AuthenticationMiddleware>();
32

The Prayer 🙏🙏

We're putting all our authentication eggs in one basket and praying that basket never drops. Every single request—health checks, metrics, public endpoints, everything—goes through this middleware and calls out to a central auth service. What could possibly go wrong? It's not like central services ever have outages, right? Nervous laughter intensifies.

The Reality Check

You've seen the headlines: major cloud provider outage takes down services worldwide because of a centralized dependency. Services running in one region fail because they rely on a control plane in another region. Banks, government services, messaging apps—all down simultaneously.

Your centralized auth service is the same Achilles heel. When it goes down:

  • Everything stops: Health checks fail, metrics endpoints return 503, even your status page goes dark
  • No graceful degradation: Users who were already authenticated get kicked out
  • Cascading failures: Your load balancer marks all instances as unhealthy
  • Angry stakeholders: "But we have 99.9% uptime!" (The auth service has 99.9%, your app now has 0%)
  • 3 AM pages: "Why is everything down? The database is fine! The servers are fine!"

The problem isn't that auth services fail—it's that when they do, you have zero fallback options. No cache, no circuit breaker, no stale data to limp along with. Just a hard stop.

The Fix

Build resilience with a three-layer fallback strategy:

csharp
1public class ResilientAuthMiddleware
2{
3    private readonly RequestDelegate _next;
4    private readonly IDistributedCache _cache;
5    private readonly IAuthService _authService;
6    private readonly ILogger<ResilientAuthMiddleware> _logger;
7    
8    public async Task InvokeAsync(HttpContext context)
9    {
10        var token = context.Request.Headers["Authorization"].FirstOrDefault();
11        if (string.IsNullOrEmpty(token))
12        {
13            context.Response.StatusCode = 401;
14            return;
15        }
16        
17        var user = await ValidateWithFallbackAsync(token);
18        if (user == null)
19        {
20            context.Response.StatusCode = 401;
21            return;
22        }
23        
24        context.Items["User"] = user;
25        await _next(context);
26    }
27    
28    private async Task<UserInfo?> ValidateWithFallbackAsync(string token)
29    {
30        var cacheKey = $"auth:{ComputeHash(token)}";
31        
32        // Layer 1: Check cache first (fast, no network call)
33        var cached = await _cache.GetStringAsync(cacheKey);
34        if (cached != null)
35            return JsonSerializer.Deserialize<UserInfo>(cached);
36        
37        // Layer 2: Call auth service with timeout
38        try
39        {
40            using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(2));
41            var user = await _authService.ValidateTokenAsync(token, cts.Token);
42            
43            if (user != null)
44            {
45                // Cache for 5 minutes
46                await _cache.SetStringAsync(cacheKey, JsonSerializer.Serialize(user),
47                    new DistributedCacheEntryOptions { AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5) });
48                
49                // Store stale copy for 1 hour
50                await _cache.SetStringAsync($"stale:{cacheKey}", JsonSerializer.Serialize(user),
51                    new DistributedCacheEntryOptions { AbsoluteExpirationRelativeToNow = TimeSpan.FromHours(1) });
52            }
53            return user;
54        }
55        catch (Exception ex) when (ex is HttpRequestException or TimeoutException or OperationCanceledException)
56        {
57            _logger.LogWarning(ex, "Auth service unavailable");
58            
59            // Layer 3: Use stale cache as last resort
60            var stale = await _cache.GetStringAsync($"stale:{cacheKey}");
61            if (stale != null)
62            {
63                _logger.LogWarning("Using stale auth cache during outage");
64                return JsonSerializer.Deserialize<UserInfo>(stale);
65            }
66            
67            return null;
68        }
69    }
70    
71    private static string ComputeHash(string input) =>
72        Convert.ToBase64String(SHA256.HashData(Encoding.UTF8.GetBytes(input)));
73}
74
75// Apply selectively - don't protect everything!
76app.UseWhen(
77    context => context.Request.Path.StartsWithSegments("/api/protected"),
78    appBuilder => appBuilder.UseMiddleware<ResilientAuthMiddleware>()
79);
80
81// These work even if auth is down
82app.MapGet("/health", () => Results.Ok(new { status = "healthy" }));
83app.MapGet("/metrics", () => Results.Ok());
84

Key improvements:

  1. Three-layer fallback: Fresh cache → Auth service → Stale cache
  2. Fast timeout: 2 seconds max, fail fast instead of hanging
  3. Dual caching strategy: Fresh cache (5 min) + stale cache (1 hour) for emergencies
  4. Selective application: Health checks and public endpoints don't need auth
  5. Graceful degradation: Continue working with slightly outdated auth data rather than failing completely

Other Resilience Techniques

  • JWT local validation: Eliminate the network call entirely by validating token signatures locally using public keys
  • Circuit breaker pattern (Polly): Stop hammering a failing auth service and fail fast
  • Multiple auth providers: Have backup auth services in different regions/zones
  • Feature flags for graceful degradation: Allow read-only access when auth is degraded
  • Health checks: Monitor your auth service dependency and alert before it becomes critical

Lesson Learned

Don't create application-level single points of failure that mirror infrastructure-level ones. When you depend on a centralized service, always ask: "What happens when this goes down?" Build in layers of fallback—cache, timeouts, stale data, and selective application. Your 3 AM self will thank you when the auth service has a bad day and your app keeps running.