Models errors
Common error signatures
Failures in the models feature fall into three categories: authentication strategy errors, provider routing failures, and circuit breaker trips.
- Authentication errors —
AuthStrategy.load()raises when the strategy file atAUTH_STRATEGY_FILEis missing, corrupt, or contains unrecognized field values.configure_auth_interactive()can fail if the file cannot be written. - Routing errors —
AdaptiveModelRouter.get_best_model()raises when no model inMODEL_REGISTRYmeets the caller'smax_cost,max_latency_ms, ormin_success_rateconstraints, or when the requestedworkflow/stagecombination has no telemetry history. - Provider failures —
AllProvidersFailedErroris raised byResilientExecutorafter every entry in aFallbackPolicyhas been exhausted. Each individual attempt can fail with a provider-level exception before the circuit breaker inCircuitBreakeropens for that provider/tier pair. - Executor errors —
EmpathyLLMExecutor.run()propagates provider exceptions when the underlyingEmpathyLLMcall fails. CheckLLMResponse.success(returnsFalsewhencontentis empty) before assuming a returned response is valid.
Where errors originate
Authentication CLI commands are the most common entry points for user-facing failures. Routing and executor errors typically originate deeper in the stack and are reported back to CLI callers.
cmd_auth_setup()— interactive first-time setup; fails if the strategy file cannot be created or ifAuthStrategy.save()encounters a permissions error.cmd_auth_status()— reads the current strategy; fails ifAuthStrategy.load()cannot parse an existing file.cmd_auth_reset()— clears the strategy file; fails on filesystem permission errors.cmd_auth_recommend()— callscount_lines_of_code()andAuthStrategy.get_recommended_mode(); fails if the target file does not exist or is not a valid Python file.main()— top-level CLI entry point; always returns1on failure.
How to diagnose
-
Check the exception type first.
AllProvidersFailedErrormeans every fallback in the activeFallbackPolicywas tried and failed — look atCircuitBreaker.get_status()to see which providers are open. AValueErrororKeyErrorfromAdaptiveModelRouter.get_best_model()usually means theworkflow/stagepair has no telemetry data yet. -
Inspect the circuit breaker state. Call
CircuitBreaker.get_status()to seefailure_count,is_open, andopened_atfor each provider/tier pair. A provider whose circuit is open will not be attempted untilrecovery_timeout_secondshas elapsed. UseCircuitBreaker.reset()to force recovery during debugging. -
Validate the auth strategy file. If
AuthStrategy.load()fails, confirm the file atAUTH_STRATEGY_FILEexists and contains valid JSON with all required fields (seeAuthStrategy.from_dict()). Runattune auth-statusto surface the parsed values, orattune auth-resetfollowed byattune auth-setupto rebuild the file from scratch. -
Check routing constraints against telemetry. If
get_best_model()returns no candidate, themax_cost,max_latency_ms, ormin_success_ratethresholds may be too strict for the availableModelPerformancedata. CallAdaptiveModelRouter.get_routing_stats()for the relevantworkflowandstageto see actual success rates and latencies before tightening constraints. -
Examine
LLMResponsefields on apparent success. A response object can be returned without raising an exception even when the call failed —LLMResponse.successisFalsewhenevercontentis empty. Always check this property before treating the response as valid output.
Source files
src/attune/models/**
Tags: models, auth, llm