Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

196 Indexer connectivity profile refresh rollups

  • Status: Accepted
  • Date: 2026-02-15
  • Context:
    • Motivation: complete the ERD connectivity rollup behavior for job_run_connectivity_profile_refresh_v1 so indexer_connectivity_profile is derived from outbound_request_log with the required thresholds and request-type scope.
    • Constraints: runtime logic must remain in stored procedures, no inline runtime SQL, no new dependencies, and tests must run through existing Rust data-layer harnesses.
  • Decision:
    • Added migration 0088_indexer_connectivity_profile_rollup_rules.sql to redefine job_run_connectivity_profile_refresh_v1.
    • Rollups now aggregate only request types (caps, search, tvsearch, moviesearch, rss, probe), exclude rate_limited from samples, and treat success as outcome='success' AND parse_ok=true.
    • Status scoring now follows ERD thresholds with explicit failing precedence for success_rate_1h < 0.90 and dominant failure classes in (auth_error, cf_challenge, tls, dns).
    • Added quarantine handling refinements: persistent failing + CF/auth/429 burst transitions to quarantined; post-cooldown healthy rollups recover to degraded while preserving prior error class context.
    • Added job-runner tests in crates/revaer-data/src/indexers/jobs.rs for no-sample defaults, low-success failure classification, persistent auth quarantine, and quarantine cooldown recovery.
    • Alternatives considered: keeping previous status logic (rejected: low-success cases could remain degraded, which conflicts with ERD failing rules) and handling quarantine transitions in application code (rejected: ERD assigns this behavior to stored-procedure rollups).
  • Consequences:
    • Positive outcomes: connectivity snapshots align with ERD sample definitions and threshold semantics; rollups update every active indexer row, including no-sample degraded state.
    • Risks or trade-offs: stricter failing/quarantine classifications can change operational status sooner than previous behavior; large outbound log windows still require efficient indexing.
  • Follow-up:
    • Implement the remaining Phase 9 rollup jobs (reputation_rollup_*, canonical_backfill_best_source, base_score_refresh_recent) and extend job tests for those procedures.
    • Revisit schema-level indexer_connectivity_profile constraint hardening if we want DB-level enforcement of non-null error_class for non-healthy statuses.
  • Design notes:
    • Status resolution is now two-stage (status_resolved then final) so non-healthy statuses can preserve prior error class context without relying on base-status assumptions.
    • Indexer scope is anchored to active indexer_instance rows (deleted_at IS NULL) so connectivity refresh is deterministic even without recent request samples.
  • Test coverage summary:
    • Added:
      • job_run_connectivity_profile_refresh_upserts_degraded_without_samples
      • job_run_connectivity_profile_refresh_marks_low_success_as_failing
      • job_run_connectivity_profile_refresh_quarantines_persistent_auth_failures
      • job_run_connectivity_profile_refresh_recovers_quarantine_to_degraded_after_cooldown
  • Observability updates:
    • None (stored-procedure behavior change only; no new telemetry surface).
  • Risk & rollback plan:
    • Roll back by reverting migration 0088_indexer_connectivity_profile_rollup_rules.sql and rerunning migration tooling in a rollback deployment.
  • Dependency rationale:
    • No new dependencies. Alternatives considered: not applicable.