Troubleshoot models

Before you start

The models feature covers three interconnected concerns: LLM authentication strategy (AuthStrategy), adaptive model routing (AdaptiveModelRouter), and provider circuit-breaking (CircuitBreaker). A failure in any one of these can surface as a misrouted task, an unexpected provider error, or a silent wrong result.

Symptom table

If you observe Check
Wrong model selected for a task AdaptiveModelRouter.get_best_model() — inspect the workflow, stage, and max_cost/max_latency_ms constraints you are passing
All requests routed to a single provider CircuitBreaker.get_status() — one or more providers may have an open circuit (is_open: true)
LLMResponse.success is False or content is empty Check LLMResponse.model_id, provider, and latency_ms — the executor may have hit a timeout or received an empty payload
Cost or token estimates look wrong AuthStrategy.estimate_cost() and estimate_tokens() — verify loc_to_tokens_multiplier (default 4.0) and that subscription_tier matches your actual plan
Auth setup appears complete but routing uses wrong mode cmd_auth_status — confirm setup_completed: true and default_mode in the saved strategy file (AUTH_STRATEGY_FILE)
Intermittent failures on one provider CircuitBreaker state — check failure_count and last_failure; the breaker opens after 5 failures and stays open for 60 seconds by default
get_best_model() raises or returns unexpected fallback Telemetry store may lack sufficient samples — ModelPerformance.sample_size below threshold causes the router to fall back to defaults

Step-by-step diagnosis

  1. Reproduce the failure in isolation. Strip the call to its required arguments. For routing issues, call AdaptiveModelRouter.get_best_model(workflow, stage) directly. For auth issues, call get_auth_strategy() and print the result. Confirm the failure occurs outside the surrounding workflow before going deeper.

  2. Check circuit-breaker state. Call CircuitBreaker.get_status() and look for any provider where is_open is True. An open circuit silently redirects all traffic away from that provider:

    from attune.models import CircuitBreaker
    cb = CircuitBreaker()
    print(cb.get_status())
    

    To reset a tripped breaker manually:

    cb.reset(provider="anthropic")   # reset one provider
    cb.reset()                        # reset all providers
    
  3. Inspect the auth strategy on disk. Run the CLI to see exactly what the saved strategy contains:

    attune auth status
    

    Or call cmd_auth_status directly. If setup_completed is False or the file is missing, re-run interactive setup:

    attune auth setup
    
  4. Check routing telemetry. Call AdaptiveModelRouter.get_routing_stats(workflow, stage, days=7) to see what the router knows about recent performance. If sample_size is 0 or very low, the router has no signal and falls back to tier defaults — this is expected behavior, not a bug.

  5. Enable DEBUG logging and re-run. Set the log level before executing the failing call:

    import logging
    logging.basicConfig(level=logging.DEBUG)
    

    The executor logs model selection, latency, and cost estimates at DEBUG level. This often reveals which routing constraint (max_cost, max_latency_ms, or min_success_rate) is eliminating candidates.

  6. Run the related tests.

    pytest -k "models" -v
    

    If a test covers the failing path, run it in isolation first. Passing tests confirm that the core logic is intact and the issue is likely in configuration or environment.

Common fixes

Source files

Tags: models, auth, llm