Work with resilience
Use resilience patterns when you need to protect your application from failures with circuit breakers, retries, timeouts, fallbacks, or health monitoring.
Prerequisites
- Access to the project source code
- Python environment with the attune package installed
- Understanding of fault tolerance concepts (circuit breakers, retries, fallbacks)
Add circuit breaker protection
-
Import the circuit breaker decorator:
from attune.resilience import circuit_breaker -
Wrap your function with the decorator:
@circuit_breaker(name="api_call", failure_threshold=3, reset_timeout=30.0) def call_external_api(): # Your external API call here response = requests.get("https://api.example.com/data") return response.json() -
Handle circuit open exceptions:
from attune.resilience import CircuitOpenError try: result = call_external_api() except CircuitOpenError as e: print(f"Circuit breaker is open, trying again in {e.reset_time} seconds")
Implement retry logic
-
Add the retry decorator to functions that might fail transiently:
from attune.resilience import retry @retry(max_attempts=3, initial_delay=1.0, backoff_factor=2.0) def unstable_operation(): # Operation that might fail due to network issues return requests.get("https://unreliable-service.com/data") -
Configure which exceptions trigger retries:
@retry( max_attempts=5, retryable_exceptions=(requests.ConnectionError, requests.Timeout) ) def network_operation(): return requests.get("https://service.com", timeout=10)
Set up fallback behavior
-
Use the fallback decorator for graceful degradation:
from attune.resilience import fallback def get_from_cache(): return {"data": "cached_value"} def get_default(): return {"data": "default_value"} @fallback(get_from_cache, get_default, default={"data": "fallback"}) def get_user_data(): # Primary data source that might fail return requests.get("https://user-service.com/data").json() -
Chain multiple fallbacks programmatically:
from attune.resilience import with_fallback robust_function = with_fallback( primary=get_from_database, fallbacks=[get_from_cache, get_from_file], default={"empty": True} )
Monitor system health
-
Register health checks for your components:
from attune.resilience import get_health_check health = get_health_check() @health.register("database", timeout=5.0, critical=True) async def check_database(): # Your database connectivity check await db.execute("SELECT 1") return "Database is healthy" -
Run health checks to get system status:
system_health = await health.run_all() if system_health.status == HealthStatus.HEALTHY: print("All systems operational") else: print(f"System issues detected: {system_health.checks}")
Add timeouts to prevent hanging
-
Wrap functions that might run too long:
from attune.resilience import timeout @timeout(seconds=30.0, error_message="Processing took too long") def long_running_task(): # Task that should complete within 30 seconds process_large_dataset() -
Use timeout with async operations:
from attune.resilience import with_timeout async def fetch_data(): result = await with_timeout( slow_async_operation(), seconds=10.0, fallback_value={"timeout": True} ) return result
Verify your implementation
-
Test circuit breaker behavior:
- Trigger failures to open the circuit
- Verify the circuit opens after reaching the failure threshold
- Confirm the circuit resets after the timeout period
-
Validate retry patterns:
- Simulate transient failures to test retry logic
- Check that non-retryable exceptions fail immediately
- Verify backoff delays increase between attempts
-
Check fallback execution:
- Force primary function failures to test fallback chain
- Ensure each fallback runs in sequence when the previous fails
- Confirm default values return when all fallbacks fail
-
Monitor health check results:
- Run
health.run_all()and inspect the returnedSystemHealthobject - Verify critical checks affect overall system status
- Check that health check timeouts work correctly
- Run
Your resilience patterns are working when functions gracefully handle failures, recover automatically when possible, and provide meaningful feedback about system health.