Comparison: Models feature — authentication modes and routing strategies

Context

The models feature covers three distinct but related responsibilities: authenticating against LLM providers (Claude subscription vs. API key), routing tasks to the right model tier based on telemetry, and managing circuit breakers when providers fail. Understanding the tradeoffs within this feature helps you configure it correctly for your workload.

Authentication mode comparison

AuthStrategy supports two authentication modes, selected via AuthMode. The system defaults to AuthMode.AUTO, which picks a mode based on module size.

Factor Subscription mode (prefer_subscription=True) API key mode (prefer_subscription=False)
Cost model Flat subscription fee; no per-token charge Pay-per-token; cost scales with usage
Best for High-volume, continuous workflows Sporadic or exploratory use
Token estimation estimate_tokens() uses loc_to_tokens_multiplier=4.0 Same estimation; cost calculated differently via estimate_cost()
Small modules (<500 LOC) May be cost-inefficient Generally cheaper
Large modules (>2000 LOC) Cost advantage grows with volume Costs accumulate quickly
Setup cmd_auth_setup() interactive wizard cmd_auth_setup() interactive wizard
Switching cmd_auth_reset() then re-run setup cmd_auth_reset() then re-run setup

AuthStrategy.get_recommended_mode() encodes this logic: pass your module's line count and it returns the optimal AuthMode for that size category (get_module_size_category() returns 'small', 'medium', or 'large').

Model tier comparison

The registry maps tasks to three tiers. TASK_TIER_MAP and AdaptiveModelRouter use these distinctions to select models at runtime.

Tier Task examples Latency expectation Cost expectation When the router selects it
Cheap (CHEAP_TASKS) Background analysis, batch summarization Higher acceptable Lowest max_cost constraint set, success_rate ≥ 0.8 on telemetry
Capable (CAPABLE_TASKS) Code generation, test writing Moderate Moderate Default for most workflow stages
Premium (PREMIUM_TASKS) Complex reasoning, architecture review Lowest acceptable Highest recommend_tier_upgrade() returns True based on historical failure rates
Realtime (REALTIME_REQUIRED_TASKS) chat, live_coding, security_incident, emergency_response Must be minimal Varies Task type is in the frozenset; cannot be overridden by cost constraints

The router scores candidates using ModelPerformance.quality_score, which combines success_rate, avg_latency_ms, and avg_cost. You can constrain selection with max_cost, max_latency_ms, and min_success_rate parameters on get_best_model().

Routing strategy comparison

AdaptiveModelRouter and direct model selection via the registry represent two different approaches.

Aspect AdaptiveModelRouter Direct registry lookup (get_model(), get_tier_for_task())
Selection basis Live telemetry over a configurable window (default 7 days) Static registry configuration
Adapts to failures Yes — recent_failures and sample_size factor into quality_score No — returns the registered model regardless of runtime behavior
Circuit breaker integration Works alongside CircuitBreaker; failing providers are excluded Not integrated; caller is responsible
Best for Production workflows where model reliability varies Testing, scripting, or when you need deterministic model selection
Upgrade recommendations recommend_tier_upgrade() signals when a higher tier would improve outcomes Not available
Observability get_routing_stats(workflow, stage, days=7) returns structured performance data None built in

Circuit breaker behavior

CircuitBreaker sits between EmpathyLLMExecutor and the provider. It opens after failure_threshold failures (default: 5) and stays open for recovery_timeout_seconds (default: 60). During the half-open state, half_open_calls (default: 1) probe call is allowed before fully re-enabling the provider.

If you call providers directly without going through EmpathyLLMExecutor, the circuit breaker does not apply — you are responsible for handling provider failures.

CLI entry points

Command function Purpose Use when
cmd_auth_setup() Interactive first-time configuration Setting up a new environment
cmd_auth_status() Display current AuthStrategy fields Debugging unexpected auth behavior
cmd_auth_reset() Clear saved strategy from disk Switching providers or subscription tiers
cmd_auth_recommend(args) Score a specific file and return the recommended AuthMode Deciding auth mode for a single large module

Use X when...

Use AuthMode.AUTO with prefer_subscription=True when you run continuous workflows against large codebases (>2000 LOC per module). The subscription tier amortizes cost across high token volumes, and get_recommended_mode() will select it automatically.

Use API key mode when your usage is infrequent or you are running one-off scripts. Token costs stay low when volume is low, and you avoid paying for subscription capacity you do not use.

Use AdaptiveModelRouter in any production workflow. It is the better default: it reacts to real failure rates, avoids degraded providers automatically, and surfaces upgrade recommendations before failures compound.

Use direct registry lookup only in tests or throwaway scripts where you need a fixed, predictable model and do not want telemetry to influence selection.

Use the REALTIME_REQUIRED_TASKS path for anything user-facing or time-critical (chat, live_coding, security_incident, emergency_response). Cost constraints passed to get_best_model() are not applied to these tasks — latency takes priority unconditionally.

Do not use this feature directly if your workflow spans multiple providers and you need coordinated fallback logic. The ResilientExecutor and FallbackStrategy layer above models handles multi-provider orchestration; wiring AdaptiveModelRouter and CircuitBreaker together manually duplicates logic that already exists there.

Source files

Tags: models, auth, llm