Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

198 Canonical refresh durable source cadence

  • Status: Accepted
  • Date: 2026-02-15
  • Context:
    • Motivation: complete ERD-derived cadence behavior for job_run_canonical_backfill_best_source_v1 and job_run_base_score_refresh_recent_v1.
    • Constraints: keep runtime DB behavior inside stored procedures, use no new dependencies, and verify behavior through revaer-data job tests.
  • Decision:
    • Added migration 0090_indexer_base_score_refresh_durable_sources.sql to redefine both job procedures.
    • job_run_base_score_refresh_recent_v1 now derives candidate canonical/source pairs directly from durable canonical_torrent_source.last_seen_at >= now()-7d instead of canonical_torrent_source_context_score.
    • job_run_base_score_refresh_recent_v1 now recomputes global winners for canonicals with durable-source activity in the last 7 days.
    • job_run_canonical_backfill_best_source_v1 now treats “recent canonicals” as canonicals with at least one durable source seen in the last 7 days, while retaining no-winner and low-confidence fallback backfill paths.
    • Added revaer-data job tests for durable-source-only base score refresh and recent durable-source backfill recomputation behavior.
    • Alternatives considered: keep context-scoped candidate selection (rejected: conflicts with ERD durable-source cadence requirement) and rely on ingest-time recompute only (rejected: ERD explicitly assigns hourly refresh to the job).
  • Consequences:
    • Positive outcomes: base score refresh and global best-source backfill now align with ERD “durable source last_seen_at” semantics.
    • Risks or trade-offs: broader durable-source candidate scans may increase hourly job work on large datasets.
  • Follow-up:
    • Implement canonical_prune_low_confidence checklist item and add focused tests for prune eligibility edge cases.
    • Validate production indexes for durable-source cadence queries as data volume increases.
  • Design notes:
    • The refresh pipeline remains deterministic: compute base scores first, then recompute global winners for the same durable-source candidate set.
    • Backfill keeps low-confidence safety behavior while adding durable-source recency as the primary cadence signal.
  • Test coverage summary:
    • Added:
      • job_run_base_score_refresh_recent_uses_durable_source_activity
      • job_run_canonical_backfill_best_source_recomputes_recent_durable_sources
  • Observability updates:
    • None (stored-procedure behavior change only).
  • Risk & rollback plan:
    • Roll back by reverting migration 0090_indexer_base_score_refresh_durable_sources.sql.
  • Dependency rationale:
    • No new dependencies. Alternatives considered: not applicable.