Skip to content

[P1] RUNTIME-01: Enabled storage/pricing dependencies can silently degrade to no-op or in-memory mode #556

@majiayu000

Description

@majiayu000

Summary

Several runtime dependencies can fail during startup or initialization while the gateway continues in a reduced mode. Some of these reduced modes are useful for local development, but they are dangerous when the user explicitly enabled the dependency or expects production-grade behavior.

Evidence

  • src/storage/mod.rs:56-80: Redis initialization failure logs a warning and creates a no-op Redis pool.
  • src/storage/redis/pool.rs:36-47: Redis disabled in config also creates no-op mode.
  • src/storage/redis/pool.rs:131-135: Redis health check returns Ok(()) in no-op mode.
  • src/storage/mod.rs:89-100: vector DB initialization failure logs a warning and continues without vector DB.
  • src/storage/mod.rs:160-167: missing vector DB is treated as healthy when not configured.
  • src/server/http.rs:59-65: budget persistence load failure logs a warning and falls back to in-memory budgets only.
  • src/server/http.rs:71-75: pricing service initial load failure logs a warning and the server continues.

Why this matters

A gateway operator may enable Redis, vector DB, pricing source, or persistent budgets because they affect user-visible behavior: caching, rate limiting, budget enforcement, semantic cache, billing/cost reporting, and restart recovery. If an enabled dependency fails but the gateway still starts with a warning-only fallback, the operator can get incorrect or non-durable behavior without a hard signal.

Suggested fix

Introduce explicit degradation policy and make the default fail-safe:

  • If a dependency is explicitly enabled, startup should fail when it cannot initialize unless the config opts into a degraded mode.
  • Keep no-op/in-memory modes for disabled dependencies and local development presets.
  • Add a typed runtime status for each dependency: configured, disabled, degraded, unavailable, healthy.
  • Reflect degraded dependency state in /health/detailed or readiness checks.
  • Make pricing initialization behavior explicit: if pricing.source is configured and cannot load, fail startup or return degraded readiness instead of silently falling back.

Acceptance criteria

  • redis.enabled: true with an unreachable Redis URL fails startup by default, or requires an explicit allow_degraded: true style setting.
  • Vector DB configured but unavailable does not silently disappear from runtime capability.
  • Budget persistence failure is visible as degraded/unready unless explicitly configured as in-memory-only.
  • Pricing source load failure is surfaced as a startup/readiness failure when configured.
  • Tests cover disabled dependency, enabled healthy dependency, enabled failing dependency, and explicit degraded mode.

Audit context

Found during a codebase audit on 2026-05-20. This issue is related to the repository's U-29 rule: no silent degradation for errors that cause user-visible missing data or wrong output.

Local checks from the audit:

cargo check --no-default-features --features lite
cargo check
cargo check --all-features

All three checks passed; the problem is runtime semantics and production safety, not compilation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High prioritybugSomething isn't workingreviewPeriodic health review finding

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions