198 Canonical refresh durable source cadence

Status: Accepted
Date: 2026-02-15
Context:
- Motivation: complete ERD-derived cadence behavior for job_run_canonical_backfill_best_source_v1 and job_run_base_score_refresh_recent_v1.
- Constraints: keep runtime DB behavior inside stored procedures, use no new dependencies, and verify behavior through revaer-data job tests.
Decision:
- Added migration 0090_indexer_base_score_refresh_durable_sources.sql to redefine both job procedures.
- job_run_base_score_refresh_recent_v1 now derives candidate canonical/source pairs directly from durable canonical_torrent_source.last_seen_at >= now()-7d instead of canonical_torrent_source_context_score.
- job_run_base_score_refresh_recent_v1 now recomputes global winners for canonicals with durable-source activity in the last 7 days.
- job_run_canonical_backfill_best_source_v1 now treats “recent canonicals” as canonicals with at least one durable source seen in the last 7 days, while retaining no-winner and low-confidence fallback backfill paths.
- Added revaer-data job tests for durable-source-only base score refresh and recent durable-source backfill recomputation behavior.
- Alternatives considered: keep context-scoped candidate selection (rejected: conflicts with ERD durable-source cadence requirement) and rely on ingest-time recompute only (rejected: ERD explicitly assigns hourly refresh to the job).
Consequences:
- Positive outcomes: base score refresh and global best-source backfill now align with ERD “durable source last_seen_at” semantics.
- Risks or trade-offs: broader durable-source candidate scans may increase hourly job work on large datasets.
Follow-up:
- Implement canonical_prune_low_confidence checklist item and add focused tests for prune eligibility edge cases.
- Validate production indexes for durable-source cadence queries as data volume increases.
Design notes:
- The refresh pipeline remains deterministic: compute base scores first, then recompute global winners for the same durable-source candidate set.
- Backfill keeps low-confidence safety behavior while adding durable-source recency as the primary cadence signal.
Test coverage summary:
- Added:
  - job_run_base_score_refresh_recent_uses_durable_source_activity
  - job_run_canonical_backfill_best_source_recomputes_recent_durable_sources
Observability updates:
- None (stored-procedure behavior change only).
Risk & rollback plan:
- Roll back by reverting migration 0090_indexer_base_score_refresh_durable_sources.sql.
Dependency rationale:
- No new dependencies. Alternatives considered: not applicable.

Keyboard shortcuts

Revaer Documentation

198 Canonical refresh durable source cadence