Troubleshoot models
Before you start
The models feature covers three interconnected concerns: LLM authentication strategy (AuthStrategy), adaptive model routing (AdaptiveModelRouter), and provider circuit-breaking (CircuitBreaker). A failure in any one of these can surface as a misrouted task, an unexpected provider error, or a silent wrong result.
Symptom table
| If you observe | Check |
|---|---|
| Wrong model selected for a task | AdaptiveModelRouter.get_best_model() — inspect the workflow, stage, and max_cost/max_latency_ms constraints you are passing |
| All requests routed to a single provider | CircuitBreaker.get_status() — one or more providers may have an open circuit (is_open: true) |
LLMResponse.success is False or content is empty |
Check LLMResponse.model_id, provider, and latency_ms — the executor may have hit a timeout or received an empty payload |
| Cost or token estimates look wrong | AuthStrategy.estimate_cost() and estimate_tokens() — verify loc_to_tokens_multiplier (default 4.0) and that subscription_tier matches your actual plan |
| Auth setup appears complete but routing uses wrong mode | cmd_auth_status — confirm setup_completed: true and default_mode in the saved strategy file (AUTH_STRATEGY_FILE) |
| Intermittent failures on one provider | CircuitBreaker state — check failure_count and last_failure; the breaker opens after 5 failures and stays open for 60 seconds by default |
get_best_model() raises or returns unexpected fallback |
Telemetry store may lack sufficient samples — ModelPerformance.sample_size below threshold causes the router to fall back to defaults |
Step-by-step diagnosis
-
Reproduce the failure in isolation. Strip the call to its required arguments. For routing issues, call
AdaptiveModelRouter.get_best_model(workflow, stage)directly. For auth issues, callget_auth_strategy()and print the result. Confirm the failure occurs outside the surrounding workflow before going deeper. -
Check circuit-breaker state. Call
CircuitBreaker.get_status()and look for any provider whereis_openisTrue. An open circuit silently redirects all traffic away from that provider:from attune.models import CircuitBreaker cb = CircuitBreaker() print(cb.get_status())To reset a tripped breaker manually:
cb.reset(provider="anthropic") # reset one provider cb.reset() # reset all providers -
Inspect the auth strategy on disk. Run the CLI to see exactly what the saved strategy contains:
attune auth statusOr call
cmd_auth_statusdirectly. Ifsetup_completedisFalseor the file is missing, re-run interactive setup:attune auth setup -
Check routing telemetry. Call
AdaptiveModelRouter.get_routing_stats(workflow, stage, days=7)to see what the router knows about recent performance. Ifsample_sizeis 0 or very low, the router has no signal and falls back to tier defaults — this is expected behavior, not a bug. -
Enable DEBUG logging and re-run. Set the log level before executing the failing call:
import logging logging.basicConfig(level=logging.DEBUG)The executor logs model selection, latency, and cost estimates at DEBUG level. This often reveals which routing constraint (
max_cost,max_latency_ms, ormin_success_rate) is eliminating candidates. -
Run the related tests.
pytest -k "models" -vIf a test covers the failing path, run it in isolation first. Passing tests confirm that the core logic is intact and the issue is likely in configuration or environment.
Common fixes
-
Open circuit breaker blocking a provider. Call
CircuitBreaker.reset(provider="<name>")to re-enable the provider immediately. If it trips again quickly, the underlying provider is genuinely unhealthy — check API status or rotate credentials. -
Auth strategy file missing or corrupt. Delete the file at
AUTH_STRATEGY_FILEand re-run setup:attune auth reset attune auth setup -
Wrong subscription tier configured. If
AuthStrategy.subscription_tierdoes not match your actual Claude plan, cost estimates and mode recommendations will be wrong. Update viaattune auth setupor edit the strategy file and reload withAuthStrategy.load(). -
get_best_model()ignores a preferred model due to cost constraint. Lower or remove themax_costargument to confirm the constraint is the cause. If the constraint is intentional, callAdaptiveModelRouter.recommend_tier_upgrade(workflow, stage)to see whether a tier upgrade would unblock routing. -
min_success_ratetoo high for available telemetry. The default is0.8. If your telemetry store is new or sparse, no model may meet this threshold. Pass a lower value explicitly until enough samples accumulate:router.get_best_model(workflow, stage, min_success_rate=0.5) -
Dependency version mismatch. A
pipupgrade can change provider client behavior. Confirm installed versions with:pip show anthropicThis change is outside the
modelsfeature itself — pin the version in your requirements file if the mismatch caused a regression.
Source files
src/attune/models/**
Tags: models, auth, llm