196 Indexer connectivity profile refresh rollups

Status: Accepted
Date: 2026-02-15
Context:
- Motivation: complete the ERD connectivity rollup behavior for job_run_connectivity_profile_refresh_v1 so indexer_connectivity_profile is derived from outbound_request_log with the required thresholds and request-type scope.
- Constraints: runtime logic must remain in stored procedures, no inline runtime SQL, no new dependencies, and tests must run through existing Rust data-layer harnesses.
Decision:
- Added migration 0088_indexer_connectivity_profile_rollup_rules.sql to redefine job_run_connectivity_profile_refresh_v1.
- Rollups now aggregate only request types (caps, search, tvsearch, moviesearch, rss, probe), exclude rate_limited from samples, and treat success as outcome='success' AND parse_ok=true.
- Status scoring now follows ERD thresholds with explicit failing precedence for success_rate_1h < 0.90 and dominant failure classes in (auth_error, cf_challenge, tls, dns).
- Added quarantine handling refinements: persistent failing + CF/auth/429 burst transitions to quarantined; post-cooldown healthy rollups recover to degraded while preserving prior error class context.
- Added job-runner tests in crates/revaer-data/src/indexers/jobs.rs for no-sample defaults, low-success failure classification, persistent auth quarantine, and quarantine cooldown recovery.
- Alternatives considered: keeping previous status logic (rejected: low-success cases could remain degraded, which conflicts with ERD failing rules) and handling quarantine transitions in application code (rejected: ERD assigns this behavior to stored-procedure rollups).
Consequences:
- Positive outcomes: connectivity snapshots align with ERD sample definitions and threshold semantics; rollups update every active indexer row, including no-sample degraded state.
- Risks or trade-offs: stricter failing/quarantine classifications can change operational status sooner than previous behavior; large outbound log windows still require efficient indexing.
Follow-up:
- Implement the remaining Phase 9 rollup jobs (reputation_rollup_*, canonical_backfill_best_source, base_score_refresh_recent) and extend job tests for those procedures.
- Revisit schema-level indexer_connectivity_profile constraint hardening if we want DB-level enforcement of non-null error_class for non-healthy statuses.
Design notes:
- Status resolution is now two-stage (status_resolved then final) so non-healthy statuses can preserve prior error class context without relying on base-status assumptions.
- Indexer scope is anchored to active indexer_instance rows (deleted_at IS NULL) so connectivity refresh is deterministic even without recent request samples.
Test coverage summary:
- Added:
  - job_run_connectivity_profile_refresh_upserts_degraded_without_samples
  - job_run_connectivity_profile_refresh_marks_low_success_as_failing
  - job_run_connectivity_profile_refresh_quarantines_persistent_auth_failures
  - job_run_connectivity_profile_refresh_recovers_quarantine_to_degraded_after_cooldown
Observability updates:
- None (stored-procedure behavior change only; no new telemetry surface).
Risk & rollback plan:
- Roll back by reverting migration 0088_indexer_connectivity_profile_rollup_rules.sql and rerunning migration tooling in a rollback deployment.
Dependency rationale:
- No new dependencies. Alternatives considered: not applicable.

Keyboard shortcuts

Revaer Documentation

196 Indexer connectivity profile refresh rollups