Motivation: complete ERD-derived cadence behavior for job_run_canonical_backfill_best_source_v1 and job_run_base_score_refresh_recent_v1.
Constraints: keep runtime DB behavior inside stored procedures, use no new dependencies, and verify behavior through revaer-data job tests.
Decision:
Added migration 0090_indexer_base_score_refresh_durable_sources.sql to redefine both job procedures.
job_run_base_score_refresh_recent_v1 now derives candidate canonical/source pairs directly from durable canonical_torrent_source.last_seen_at >= now()-7d instead of canonical_torrent_source_context_score.
job_run_base_score_refresh_recent_v1 now recomputes global winners for canonicals with durable-source activity in the last 7 days.
job_run_canonical_backfill_best_source_v1 now treats “recent canonicals” as canonicals with at least one durable source seen in the last 7 days, while retaining no-winner and low-confidence fallback backfill paths.
Added revaer-data job tests for durable-source-only base score refresh and recent durable-source backfill recomputation behavior.
Alternatives considered: keep context-scoped candidate selection (rejected: conflicts with ERD durable-source cadence requirement) and rely on ingest-time recompute only (rejected: ERD explicitly assigns hourly refresh to the job).
Consequences:
Positive outcomes: base score refresh and global best-source backfill now align with ERD “durable source last_seen_at” semantics.
Risks or trade-offs: broader durable-source candidate scans may increase hourly job work on large datasets.
Follow-up:
Implement canonical_prune_low_confidence checklist item and add focused tests for prune eligibility edge cases.
Validate production indexes for durable-source cadence queries as data volume increases.
Design notes:
The refresh pipeline remains deterministic: compute base scores first, then recompute global winners for the same durable-source candidate set.
Backfill keeps low-confidence safety behavior while adding durable-source recency as the primary cadence signal.