Revaer

Centralized torrent orchestration with hot-reloadable configuration, consistent CLI/API surfaces, and observability-first defaults.

Revaer is a Rust workspace that coordinates torrent ingestion, filesystem operations, and operational guardrails from a PostgreSQL-backed control plane. The revaer-app binary composes focused crates covering the API, CLI, filesystem pipeline, telemetry, and libtorrent adapter.

What You’ll Find Here

Roadmap & Specs – Track the current Phase One scope and remaining delivery deltas.
Platform Interfaces – Configuration schema, HTTP API endpoints, and CLI command reference that match the current codebase.
Operational Guides – Runbook, release checklist, and setup flows for operators.
Architecture Decisions – ADRs documenting trade-offs across configuration, security, and engine integration.
API Reference – Generated OpenAPI description and usage guidance for the control plane surface.

Use the sidebar navigation (or [ and ] shortcuts) to explore individual topics. Most pages include headings that double as tags for machine-readable manifests generated by the docs indexer.

Contributing Updates

Documentation lives next to the code. Add or edit Markdown files under docs/, then run:

just docs

This builds the mdBook site and refreshes the LLM manifests that power the documentation search experience.

Phase One Roadmap

Last updated: 2025-10-16

This document captures the current delta between the Phase One objective and the existing codebase. It should be kept in sync as work progresses across the eight workstreams.

Snapshot

Workstream	Current State	Key Gaps	Immediate Actions
Control Plane & Setup	Postgres schema, ConfigService watcher, setup CLI/API, immutable-key guard, history logging; loopback enforcement + RFC7807 pointers live	Engine hot-reload not yet exercising throttles; setup token lifecycle/error telemetry still thin	Add watcher-driven throttle tests, expand setup diagnostics and rate-limit guardrails
Torrent Domain & Adapter	Event bus, orchestrator scaffold, enriched torrent DTOs, stub session worker now persists resume metadata/fastresume, reconciles selection/sequential flags, enforces throttle guard rails, and surfaces degraded health	Real libtorrent FFI binding and alert pump still pending; need to exercise live fast-resume blobs and real libtorrent rate/health controls	Replace stub session with libtorrent bindings, translate real alerts, and validate against native libtorrent in the feature-gated suite
File Selection & FsOps	FsOpsService emits lifecycle events and validates library root	No extraction/flatten/move-perms/cleanup pipeline, no `.revaer.meta`, no allow-list enforcement	Model FsOps plan, implement idempotent steps with allow-list + metadata tracking, add fixtures/tests
Public HTTP API & SSE	Admin setup/settings/torrent CRUD, SSE stream, metrics stub, OpenAPI generator	`/v1/torrents/*` family absent, no cursor pagination/filters, SSE replay lacks Last-Event-ID tests, health endpoints minimal	Define domain DTOs, implement public routes + filtering, extend SSE replay handling/tests, flesh out health
CLI Parity	Supports setup start/complete, settings patch, admin torrent add/remove (magnet-aware), status	Missing `select`, `action`, `ls`, `status` detail view, `tail` SSE client, richer validation	Extend CLI command surface to mirror API, add reconnecting SSE tailer, flesh out filtering and exit-code contract
Security & Observability	API key storage hashed, tracing initialised, metrics registry struct	No per-key rate limits, no X-RateLimit headers, magnet/body bounds missing, tracing not propagated to engine/fsops, metrics unused	Introduce token-bucket middleware, enforce payload bounds, propagate spans through orchestrator/fsops, export Prometheus counters
CI & Packaging	Workspace compiles, justfile for fmt/lint/test	No CI workflows, cargo-deny/audit missing, no env access guard, no Docker packaging or healthcheck	Author GitHub Actions (lint, security, tests, build), enforce env guard lint, build minimal non-root container with HEALTHCHECK
Operational End-to-End	Bootstrap skeleton and event bus exist	Torrent download, fs pipeline, restart resume, throttling, degraded health scenarios unimplemented	Sequence implementation/testing to satisfy runbook once engine/fsops/API parity are in place

Remaining Scope Specification

1. Torrent Engine Integration

Swap the stubbed LibtSession for the real libtorrent binding so the existing worker drives a native session while continuing to process commands for add/pause/resume/remove, sequential toggles, rate limits, selection updates, reannounce, and force-recheck.
Validate persisted fast-resume payloads, priorities, target directories, and sequential flags against the live session on startup; continue emitting reconciliation events when divergence is detected.
Translate libtorrent alerts into EventBus messages (FilesDiscovered, Progress, StateChanged, Completed, Failure) while respecting the ≤10 Hz per-torrent coalescing rule; recover from alert polling failures by degrading health and attempting bounded restarts.
Ensure global and per-torrent rate caps driven by engine_profile updates are enforced by libtorrent within two seconds, with audit logs surfaced when caps change.
Extend the feature-gated integration suite to execute against the native libtorrent build (resume restore, rate-limit enforcement, alert mapping) in addition to the in-process stub.

2. File Selection & FsOps Pipeline

Implement include/exclude glob logic with skip-fluff presets backed by the allow-list; synchronize selection changes to libtorrent file priorities and issue corresponding EventBus notifications.
Build the FsOps pipeline stages: extraction (7z), optional flattening, move/hardlink/copy into library roots, chmod/chown/umask adjustments, metadata capture, cleanup, and optional checksum calculation; each stage must record outcomes in .revaer.meta for idempotency.
Enforce DB-driven allow-lists, refusing to access paths outside permitted roots and emitting structured errors when policies block execution.
Degrade pipeline health when dependencies are missing (e.g., extractor binaries), ensuring both EventBus and /health/full reflect the condition; resume normal health once remediation succeeds.
Back the pipeline with unit coverage for rule parsing and integration coverage for an end-to-end torrent completion to library handoff, including restart scenarios that reuse .revaer.meta.

3. Public HTTP API & SSE

Ship /v1/torrents endpoints: POST (magnet or multipart .torrent), GET collection with cursor pagination and filters (name, state, tracker, extension), GET detail, POST /select, and POST /action (pause/resume/remove with optional data deletion, reannounce, recheck, sequential toggle); enforce validation aligned with domain rules.
Adopt Problem+JSON responses that include JSON Pointer references for every validation failure; extend shared error helpers so CLI can mirror the structure.
Enhance SSE with Last-Event-ID replay, duplicate suppression, resumable connections, and explicit event type exposure for new workflow outputs.
Expand health reporting to /health/full, surfacing engine, FsOps, and database status with latency measurements, dependency readiness, and revision metadata.
Update OpenAPI specs and golden request/response samples to cover the new surfaces; add integration tests that exercise pagination, filters, and SSE replay.

4. CLI Parity

Add commands revaer ls, status, select, action, and tail, mirroring API filters, selection arguments (include/exclude/skip-fluff), sequential toggles, and data deletion flags.
Implement an SSE tailer that reconnects on failure, honors Last-Event-ID, and avoids duplicate terminal output.
Standardize exit codes (0 success, 2 validation, >2 runtime failures) and surface RFC7807 payloads, including pointer metadata, in human-readable CLI output.
Provide CLI integration tests that run against the API fixture stack, covering filter combinations, sequential toggles, and tail reconnection behaviour.

5. Security & Observability

Introduce API key lifecycle endpoints (issue, rotate, revoke) with hashed-at-rest storage, returning secrets only once; enforce per-key token-bucket rate limiting and include X-RateLimit-* headers.
Harden inputs by bounding magnet length, multipart size, filter glob counts, and header values; return Problem+JSON validation errors without panics for malformed requests.
Propagate tracing spans (request IDs) through the API, engine, and FsOps layers; ensure metrics cover HTTP status, event flow, queue depth, libtorrent transfer, and FsOps step durations, exposed via /metrics.
Reflect degraded health when tools are missing, engine sessions fault, or queue depth exceeds thresholds; emit corresponding SettingsChanged and HealthChanged events.
Document operational expectations for rate limiting, key rotation, and observability dashboards.

6. CI & Packaging

Create GitHub Actions (or equivalent) workflows for formatting (cargo fmt), linting (cargo clippy -D warnings), security scans (cargo deny, cargo audit), tests (unit/integration with Postgres and libtorrent behind an opt-in guard), and cross-compilation artifacts for Linux x86_64 and aarch64.
Enforce an environment-access lint that fails CI if std::env reads occur outside the composition root (excluding DATABASE_URL).
Produce a non-root Docker image with read-only root filesystem, declared volumes, and a healthcheck hitting /health; ensure runtime documentation validates within the image.
Publish build artifacts and container digests with provenance metadata; wire CI status into the roadmap release checklist.

7. Operational Runbook Automation

Author a script to execute the full phase objective on both x86_64 and aarch64: bootstrap using DATABASE_URL, complete setup token flow, add a magnet, monitor FilesDiscovered/Progress/Completed, run FsOps, simulate crash/restart with fast-resume recovery, adjust throttles, and validate degraded health when extractors are absent.
Capture assertions and logs for each phase, producing artifacts suitable for runbook review and CI retention; ensure failures mark the engine or pipeline health accordingly.
Include cleanup routines to return environments to a reusable state while retaining diagnostic logs.

8. Documentation & Final Polish

Update docs/phase-one-roadmap.md continuously and add ADRs covering engine architecture, FsOps design, API/CLI contracts, and security posture.
Regenerate docs/api/openapi.json alongside illustrative request/response examples for new endpoints.
Extend user-facing guides for CLI usage, health/metrics references, and operational setup covering API keys, rate limits, and degraded-mode recovery.
Provide a final Phase One release checklist that ties documentation, runbook, and CI artifacts together.

Next Steps Tracking

Land setup/network hardening and control-plane polish.
Replace the stub worker with a real libtorrent session, resume store, and alert-driven event bridge.
Implement FsOps pipeline with allow-listed execution and metadata.
Expose /v1/* APIs + CLI parity and reinforce security/observability.
Stand up CI, packaging, and full runbook validation.

Phase One Remaining Engineering Specification

Objectives

Deliver a production-ready public interface (HTTP API, SSE, CLI) for torrent orchestration.
Ship FsOps-backed artefacts through API, CLI, telemetry, and documentation with demonstrable reliability.
Produce release artefacts (containers, binaries, documentation) that satisfy existing security, observability, and quality gates.

Scope Overview

Public HTTP API & SSE Enhancements
- /v1/torrents CRUD-style endpoints with cursor pagination, filtering, torrent actions, file selection updates, rate adjustments, and Problem+JSON responses.
- SSE stream upgrades: Last-Event-ID replay, subscription filters, duplicate suppression, jitter-tolerant reconnect logic.
- /health/full exposing engine/FsOps/config readiness, dependency metrics, and revision metadata.
- Regenerated OpenAPI (JSON + examples) reflecting the full public surface.
CLI Parity
- Commands covering list/status/select/action/tail flows with shared filtering + pagination options.
- SSE-backed tail command with Last-Event-ID resume, dedupe, and retry semantics aligned with the API.
- Problem+JSON error output, structured exit codes (0 success, 2 validation, >2 runtime failures).
Packaging & Documentation
- Release-ready Docker image (non-root, readonly FS, volumes, healthcheck) bundling API server + docs.
- Provenance-signed binaries for supported architectures, plus GitHub Actions workflows for build, docker, msrv, and coverage gates.
- Updated ADRs, runbook, user guides, OpenAPI artefacts, and release checklist referencing the telemetry and security posture.
- Documentation of new metrics/traces/guardrails (config watcher latency, FsOps events, API counters).

Security & Observability Requirements (Cross-Cutting)

All new API routes enforce API-key authentication with per-key rate limiting and guard-rail metrics.
Problem+JSON responses are mandatory; eliminate unwrap/panic paths and include invalid_params pointers on validation failure.
Trace propagation from API → engine → FsOps; CLI should emit/propagate TraceId when available.
Metrics: extend existing Prometheus registry with route labels, FsOps step counters, config watcher latency/failure gauges, and rate-limiter guardrails.
Health degradation events (Event::HealthChanged) must accompany any new guard-rail/latency breach or pipeline failure.
CLI commands should mask secrets in logs and optionally emit telemetry when configured (REVAER_TELEMETRY_ENDPOINT).

Detailed Work Breakdown

1. Public API & SSE

Design Considerations

Introduce DTO module (api::models) for request/response structs to share with the CLI.
Cursor pagination: encode UUID/timestamp as opaque cursor in next token; align Last-Event-ID semantics with event stream IDs.
Filtering: support state, tracker, extension, tags, and name substring; guard invalid combinations with Problem+JSON.
SSE filtering: permit query parameters for torrent subset, replays based on event type/state.

Implementation Tasks

Routes:
- POST /v1/torrents – magnet or .torrent upload (streamed, payload size guard).
- GET /v1/torrents – cursor pagination + filters.
- GET /v1/torrents/{id} – detail view with FsOps metadata.
- POST /v1/torrents/{id}/select – file selection update with validation.
- POST /v1/torrents/{id}/action – pause/resume/remove (with data), reannounce, recheck, sequential toggle, rate limits.
SSE:
- Accept Last-Event-ID header, deduplicate by event ID, filter streams by torrent ID/state.
- Simulate jitter/disconnects in tests (tokio::time::pause, transport::Stream).
Health endpoint:
- Aggregate config watcher metrics (latency, failures), FsOps status, engine guardrails, revision hash.
Problem+JSON mapping for all new errors with invalid_params pointer data.
OpenAPI:
- Regenerate spec covering new endpoints, Problem responses, SSE details, and sample payloads.
Testing:
- Unit tests for filter parsing, DTO validation, Problem+JSON outputs.
- Integration tests using tower::Service harness for each route.
- SSE reconnection tests with simulated delays and Last-Event-ID resume.
- /health/full integration test verifying new fields and degraded scenarios.

2. CLI Parity

Design Considerations

Reuse DTOs from API models; consider shared crate/module for request structs and Problem+JSON parsing.
Introduce output formatting with optional JSON/pretty table modes.
Provide configuration via env vars and CLI flags; align defaults with API (e.g., REVAER_API_URL, REVAER_API_KEY).

Implementation Tasks

Commands:
- revaer ls – list torrents, support pagination (--cursor, --limit), filters (state/tracker/extension/tags).
- revaer status <id> – torrent detail view, optional follow mode.
- revaer select <id> – send selection rules from file/JSON (validate before submit).
- revaer action <id> – actions (pause, resume, remove, remove-data, reannounce, recheck, sequential, rate).
- revaer tail – SSE tail with Last-Event-ID persist (local file) and dedupe.
Problem+JSON handling:
- Standardised pretty printer summarising title, detail, invalid_params; respect exit codes.
Telemetry:
- Optional metrics emission (success/failure counters) when telemetry endpoint configured.
Testing:
- Integration tests using httpmock to assert HTTP interactions and exit codes.
- SSE tail tests with mocked stream delivering duplicates/disconnects.
- Snapshot tests for JSON outputs (ensuring deterministic fields).

3. Packaging & Documentation

Design Considerations

Multi-stage Docker build: compile with Rust image, run on minimal base (distroless/alpine/ubi) with non-root user.
Healthcheck script hitting /health/full with timeout.
Release workflows should run on GitHub Actions with provenance metadata (supply-chain compliance).

Implementation Tasks

Dockerfile + Makefile/just target:
- Build release binary, copy docs/api/openapi.json, set /app as workdir.
- Define volumes for data/config, create user revaer, configure entrypoint.
GitHub Actions (update .github/workflows):
- build-release: run just build-rel, just api-export, attach binaries/docs.
- docker: build image, run docker scan (trivy/grype), and push on release tags.
- msrv: run just fmt lint test with pinned toolchain (documented in workflow).
- cov: ensure just cov gate passes (≥80% lines/functions).
Documentation:
- ADRs: update 003-libtorrent-session-runner, add FsOps design ADR, API/CLI contract ADR, security posture update (API keys, rate limits).
- Runbook: scripted scenario covering bootstrap → torrent add → FsOps pipeline → restart resume → rate throttle adjustments → degraded health simulation → recovery.
- User guides: CLI usage, metrics/telemetry reference, operational setup (keys, rate limits, config watcher health).
- OpenAPI: regenerate JSON, include sample Problem+JSON payloads and SSE description.
- Release checklist: steps to run just ci, verify coverage, run docker scan, execute runbook, and tag release.
Testing:
- Validate Docker container runtime (healthcheck, volume mounts, non-root permissions).
- Perform coverage review ensuring new tests bring line/function coverage ≥80%.
- Execute runbook; capture logs/metrics and link in docs.

Cross-Cutting Deliverables

API key lifecycle (issue/rotate/revoke) extended with per-key rate limiting, recorded in telemetry and docs.
Config watcher telemetry integrated into /health/full and metrics registry.
CLI and API emit guard-rail telemetry on violations (loopback enforcement, FsOps errors, rate-limit breaches).
All new code paths covered by unit/integration tests; follow-up to update just cov gating.
Documentation kept up-to-date with implementation details and tested flows.

Sequencing (Suggested)

Build API models and endpoints (foundation for CLI).
Implement SSE enhancements while adding API integration tests.
Extend CLI commands leveraging shared DTOs.
Embed telemetry (metrics/traces) throughout API/CLI/FsOps changes.
Stand up Docker build + CI workflows.
Update ADRs, runbook, user guides, OpenAPI, and release checklist.
Execute full QA cycle (coverage, docker scan, runbook, manual verification) and prepare for release tagging.

Acceptance Criteria

just lint, just test, just cov and full just ci pass locally and in CI.
Coverage (lines + functions) ≥ 80% across workspace.
Docker image passes security scan with zero unwaived high severity findings.
Runbook executed end-to-end; results referenced in documentation.
OpenAPI specification and CLI docs match implemented behaviour.
Release checklist completed with artefacts attached (binaries, Docker image, OpenAPI, docs).

Phase One Runbook

This runbook exercises the end-to-end control plane, validating FsOps, telemetry, and guard rails.

Prerequisites

Docker image revaer:ci (built via just docker-build) or a local revaer-app binary (just build-rel).
PostgreSQL instance accessible to the application.
API key with a conservative rate limit (e.g., burst 5, period 60s).
CLI configured with REVAER_API_URL, REVAER_API_KEY, and optional REVAER_TELEMETRY_ENDPOINT.

Scenario

Bootstrap
- Issue a setup token: revaer setup start --issued-by runbook.
- Complete configuration with CLI secrets and directories: revaer setup complete --instance runbook --bind 127.0.0.1 --resume-dir /data/resume --download-root /data/downloads --library-root /data/library --api-key-label runbook --passphrase <pass>.
- Confirm /health/full returns status=ok and guardrail_violations_total=0.
Add Torrent & Observe FsOps
- Add a torrent: revaer torrent add <magnet> --name runbook.
- Tail events: revaer tail --event torrent_added,progress,state_changed --resume-file /tmp/revaer.tail.
- Verify FsOps emits fsops_started, fsops_completed, and Prometheus counters fsops_steps_total increase.
Restart & Resume
- Stop the application, restart it, and ensure the torrent catalog repopulates.
- Confirm SelectionReconciled (if metadata diverges) and HealthChanged clears once resume succeeds.
Rate Limit Guard-Rail
- Apply a tight API key limit (burst 1 / per_seconds 60) via config apply.
- Execute three rapid CLI calls (e.g., revaer status <id>). The third should exit with code 3, displaying a 429 Problem+JSON response.
- Inspect /metrics to verify api_rate_limit_throttled_total incremented and /health/full reflects degraded=["api_rate_limit_guard"].
Recovery
- Restore the API key limit to an acceptable value.
- Re-run revaer status <id> to confirm success, guardrail_violations_total stops increasing, and degraded returns to [].
FsOps Failure Simulation
- Temporarily revoke write permissions on the library directory and re-run a completion.
- Observe fsops_failed events, HealthChanged with ["fsops"], and guard-rail telemetry.
- Restore permissions and confirm recovery events.

Verification Artifacts

Archive CLI telemetry emitted to REVAER_TELEMETRY_ENDPOINT.
Capture Prometheus scrapings (/metrics) before and after the run.
Record /health/full JSON snapshots for each phase.

Successful completion of this runbook satisfies the operational validation gate defined in AGENT.md.

Phase One Release Checklist

Branch Hygiene
- Ensure main is green (CI pipeline complete).
- Review outstanding ADRs and docs for freshness.
Build & Test
- just ci
- just build-rel
- just api-export
Artefact Verification
- Binary: target/release/revaer-app
- Checksum: sha256sum target/release/revaer-app
- OpenAPI: docs/api/openapi.json
- Docker image: just docker-build && just docker-scan
Runbook Execution
- Follow docs/runbook.md
- Archive CLI telemetry, /metrics, /health/full snapshots.
Documentation Refresh
- Verify ADRs 005–007 reflect current design.
- Update user guides (docs/api/guides/*.md) with any behavioural changes.
Tag & Publish
- Create annotated tag: git tag -a vX.Y.Z -m "Phase One release"
- Push tag: git push origin vX.Y.Z
- Attach artefacts generated by the build-release workflow.
Post-Release Monitoring
- Watch rate-limit and guard-rail metrics.
- Confirm HealthChanged events return to empty degraded set.
- Validate automation telemetry for CLI success rates.

Configuration Surface

Canonical reference for the PostgreSQL-backed settings documents that drive Revaer’s runtime behaviour.

Revaer persists all operator-facing configuration inside the settings_* tables. The API (ConfigService) exposes strongly-typed snapshots that are consumed by the API server, torrent engine, filesystem pipeline, and CLI. Every change flows through a SettingsChangeset, ensuring a single validation path whether commands originate from the setup flow or the admin API.

Snapshot Components

The / .well-known/revaer.json endpoint and revaer setup complete CLI command both return the same structure:

{
  "revision": 42,
  "app_profile": { /* see below */ },
  "engine_profile": { /*…*/ },
  "fs_policy": { /*…*/ },
  "api_keys": [
    { "key_id": "admin", "label": "bootstrap", "enabled": true, "rate_limit": null }
  ]
}

App Profile (`settings_app_profile`)

Field	Type	Description
`id`	UUID	Singleton identifier for the current document.
`instance_name`	string	Human readable label surfaced in the CLI after setup.
`mode`	`"setup"` or `"active"`	Gatekeeper for the authentication middleware. Setup requests are rejected once the system enters `active`.
`version`	integer	Optimistic locking counter maintained by `ConfigService`.
`http_port`	integer	Published TCP port for the API server.
`bind_addr`	string (IPv4/IPv6)	Listen address for the API server.
`telemetry`	object	Free-form map for logging + metrics toggles (e.g. `log_level`, `prometheus`).
`features`	object	Feature switches such as `fs_extract`, `par2`, `sse_backpressure`.
`immutable_keys`	array	List of fields that cannot be mutated via patches (`ConfigError::ImmutableField`).

Engine Profile (`settings_engine_profile`)

Field	Type	Description
`implementation`	string	Currently `libtorrent`. Used to select the torrent workflow implementation.
`listen_port`	integer?	Optional external listen port override for the engine.
`dht`	bool	Enables/disables the DHT module.
`encryption`	string	Encryption requirement (`require`, `prefer`, etc.).
`max_active`	integer?	Cap on concurrently-active torrents; `null` means unlimited.
`max_download_bps` / `max_upload_bps`	integer?	Global rate limits applied by the engine.
`sequential_default`	bool	Default sequential downloading behaviour for new torrents.
`resume_dir`	string	Filesystem location where fast-resume artefacts are stored.
`download_root`	string	Directory used for in-progress torrent payloads.
`tracker`	object	Tracker configuration (user-agent, announce overrides).

Filesystem Policy (`settings_fs_policy`)

Field	Type	Description
`library_root`	string	Destination directory for completed artefacts.
`extract`	bool	Whether completed payloads are extracted.
`par2`	string	`off`, `verify`, or `repair` depending on PAR2 behaviour.
`flatten`	bool	Collapses single-file directories when moving into the library.
`move_mode`	string	`copy`, `move`, or `hardlink` semantics for the FsOps pipeline.
`cleanup_keep` / `cleanup_drop`	array	Glob patterns retaining or removing files during cleanup.
`chmod_file` / `chmod_dir`	string?	Optional octal permissions applied to outputs.
`owner` / `group`	string?	Optional ownership override for the library root.
`umask`	string?	Umask applied during FsOps.
`allow_paths`	array	Allowed staging/library paths the pipeline accepts.

API Keys & Secrets

Patches can create, update, or revoke keys and named secrets. The request format mirrors SettingsChangeset:

{
  "api_keys": [
    {
      "op": "upsert",
      "key_id": "admin",
      "label": "primary",
      "enabled": true,
      "secret": "optional-override",
      "rate_limit": { "burst": 10, "per_seconds": 1 }
    }
  ],
  "secrets": [
    { "op": "set", "name": "libtorrent.passphrase", "value": "..." }
  ]
}

The API server enforces bucketed rate limits if rate_limit is supplied (burst per per_seconds). Invalid field names or mutations against immutable_keys yield RFC9457 ProblemDetails responses with an invalid_params array matching the JSON pointer returned by ConfigError.

Change Workflows

Setup – POST /admin/setup/start issues a one-time token. POST /admin/setup/complete consumes that token, applies the provided SettingsChangeset, forces app_profile.mode to active, and returns the hydrated snapshot along with the generated API key (also echoed in the CLI output).
Ongoing updates – PATCH /admin/settings (CLI: revaer settings patch --file changes.json) requires an API key and supports partial documents. Any field omitted from the payload remains untouched.
Snapshot access – GET /.well-known/revaer.json (no auth) and GET /health/full both return the revision and enable automation to verify configuration drift. Automation and dashboards can poll these endpoints without authenticating.

Revaer publishes SettingsChanged events on every successful mutation, ensuring subscribers refresh in-memory caches without polling.

HTTP API

REST + SSE surface exposed by revaer-api. The OpenAPI document at /docs/openapi.json is generated by just api-export.

Authentication

Setup flow – Requests to /admin/setup/start are open. /admin/setup/complete requires the x-revaer-setup-token header with the one-time token returned by setup_start. The server refuses setup calls once the app profile switches to active.
Operator actions – All /admin/* (after setup) and /v1/* endpoints require x-revaer-api-key: {key_id}:{secret}. The middleware validates the key via ConfigService, enforces per-key rate limiting, and rejects calls while the instance remains in setup mode.
Request correlation – An optional x-request-id header is echoed into tracing spans and surfaced on SSE traffic. The CLI auto-populates this header per invocation.

Error responses follow RFC9457 (ProblemDetails), populated with invalid_params entries whenever validation pinpoints a JSON pointer within the payload.

Endpoint Inventory

Method	Path	Auth	Description
`GET`	`/health`	none	Lightweight readiness probe returning mode + database status.
`GET`	`/health/full`	none	Extended health snapshot with build SHA, metrics counters, and torrent queue depth.
`GET`	`/.well-known/revaer.json`	none	Full configuration snapshot (`ConfigSnapshot`) including current revision.
`POST`	`/admin/setup/start`	none	Issues a setup token; optionally accepts `issued_by` + `ttl_seconds`.
`POST`	`/admin/setup/complete`	setup token	Applies a `SettingsChangeset`, promotes the instance to `active`, consumes the token, and returns the hydrated snapshot.
`PATCH`	`/admin/settings`	API key	Applies partial configuration updates (`SettingsChangeset`) and broadcasts `SettingsChanged`.
`GET`	`/admin/torrents`	API key	Same as `GET /v1/torrents`; retained for admin tooling.
`POST`	`/admin/torrents`	API key	Alias for `POST /v1/torrents`.
`GET`	`/admin/torrents/:id`	API key	Alias for `GET /v1/torrents/:id`.
`DELETE`	`/admin/torrents/:id`	API key	Alias for invoking the `remove` action.
`GET`	`/v1/torrents`	API key	Cursor-paginated torrent summaries with filtering (state, tracker, extension, tags, name).
`POST`	`/v1/torrents`	API key	Submits a magnet URI or base64-encoded `.torrent`, optional tags/trackers, file rules, and per-torrent rate limits.
`GET`	`/v1/torrents/:id`	API key	Detailed torrent view including file metadata when available.
`POST`	`/v1/torrents/:id/select`	API key	Adjusts inclusion/exclusion globs, fluff skipping, and per-file priorities.
`POST`	`/v1/torrents/:id/action`	API key	Lifecycle management (`pause`, `resume`, `remove`, `reannounce`, `recheck`, `sequential`, `rate`).
`GET`	`/v1/events`	API key	Server-sent events stream (alias: `/v1/torrents/events`). Supports filtering by torrent ID, state, and event kind.
`GET`	`/metrics`	none	Prometheus-formatted metrics from `revaer-telemetry`.
`GET`	`/docs/openapi.json`	none	Static OpenAPI document used by the docs site and clients.

All torrent-managing endpoints ensure the torrent workflow is wired. If the engine is unavailable, the API returns 503 Service Unavailable.

Torrent Submission (`POST /v1/torrents`)

Required headers: x-revaer-api-key. Provide either magnet or metainfo; the server rejects payloads missing both. Optional fields:

download_dir – Overrides the engine profile’s staging directory.
sequential – Enables sequential downloading for this torrent only.
tags / trackers – Stored alongside the torrent for filtering and bookkeeping.
include / exclude / skip_fluff – File selection bootstrap applied before metadata fetch completes.
max_download_bps / max_upload_bps – Per-torrent rate limits (bps) passed to the workflow.

On success the server returns 202 Accepted after dispatching TorrentWorkflow::add_torrent. The torrent ID in the payload becomes the canonical identifier.

Listing & Filtering (`GET /v1/torrents`)

Query parameters:

limit (default 50, max 200)
cursor – Base64 token returned in next.
state, tracker, extension, tags, name – Comma-separated filters (case-insensitive).

The response body is TorrentListResponse with an optional next cursor when additional pages exist.

Torrent Actions (`POST /v1/torrents/:id/action`)

type determines the shape of the body:

{ "type": "remove", "delete_data": true }
{ "type": "sequential", "enable": false }
{ "type": "rate", "download_bps": 1048576, "upload_bps": null }

Failures propagate engine errors as 500 Internal Server Error with a descriptive message in detail.

SSE Stream (`GET /v1/events`)

Headers:

x-revaer-api-key
Optional Last-Event-ID – resuming from a previously stored ID (the CLI stores this via --resume-file).

Query parameters:

torrent – Comma-separated UUIDs.
event – Comma-separated event kinds. Valid values: torrent_added, files_discovered, progress, state_changed, completed, fsops_started, fsops_progress, fsops_completed, fsops_failed, settings_changed, health_changed, selection_reconciled.
state – Comma-separated torrent states (downloading, completed, etc.).

The server maintains a 20-second keep-alive ping and enforces filtering before events hit the wire.

Health & Metrics

GET /health – Primary readiness probe used by orchestration systems. Adds database to the degraded list if PostgreSQL is unreachable.
GET /health/full – Returns the deployment revision, build SHA (build_sha()), metrics snapshot (config_watch_latency_ms, guardrail_violations_total, rate_limit_throttled_total, etc.), and torrent queue depth.
GET /metrics – Exposes the same counters for Prometheus scraping.

For the complete schema definitions, consult the generated OpenAPI (just api-export).

CLI Reference

revaer-cli provides parity with the API for setup, configuration management, torrent lifecycle, and observability.

Global Flags & Environment

Flag	Environment	Default	Description
`--api-url <URL>`	`REVAER_API_URL`	`http://127.0.0.1:7070`	Base URL for API requests.
`--api-key <key_id:secret>`	`REVAER_API_KEY`	none	Required for all post-setup commands that mutate or read torrents.
`--timeout <secs>`	`REVAER_HTTP_TIMEOUT_SECS`	`10`	Per-request HTTP timeout.

Each invocation bubbles a unique x-request-id through the API; the CLI also emits optional telemetry events when REVAER_TELEMETRY_ENDPOINT is set.

Setup Flow

`revaer setup start [--issued-by <label>] [--ttl-seconds <secs>]`

Calls POST /admin/setup/start.
Prints the plaintext token followed by its ISO8601 expiry.
Use --issued-by to tag the token source (defaults to api).

`revaer setup complete --instance <name> --bind <addr> --port <port> --resume-dir <path> --download-root <path> --library-root <path> --api-key-label <label> [--api-key-id <id>] [--passphrase <value>] [--token <token>]`

Loads the setup token either from --token or REVAER_SETUP_TOKEN.
Builds a SettingsChangeset containing the app profile, engine profile, filesystem policy, API key, and optional secret.
Forces app_profile.mode = "active".
Echoes the generated API key (key_id:secret) on success; store it securely before continuing.

Configuration Maintenance

`revaer settings patch --file <path>`

Reads a JSON file containing a partial SettingsChangeset.
Requires an API key.
Returns a formatted ProblemDetails message if validation fails (immutable fields, unknown keys, etc.).

Torrent Lifecycle

`revaer torrent add <magnet|.torrent> [--name <label>] [--id <uuid>]`

Accepts a magnet URI or a filesystem path to a .torrent.
Automatically base64-encodes torrent files for the API.
Optional overrides: --name sets the human-friendly label; --id lets you supply a deterministic UUID instead of the auto-generated value.

`revaer torrent remove <uuid>`

Issues POST /v1/torrents/{id}/action with { "type": "remove" }.
Use the more general action command for delete_data semantics.

`revaer ls [--limit <n>] [--cursor <token>] [--state <state>] [--tracker <url>] [--extension <ext>] [--tags <tag1,tag2>] [--name <fragment>] [--format table|json]`

Lists torrents with the same filters supported by the REST API.
Default output is a table summarising id, name, state, and progress.
JSON output matches TorrentListResponse.

`revaer status <uuid> [--format table|json]`

Returns a detailed view of a single torrent.
JSON output is the full TorrentDetail (including file metadata when available).

`revaer select <uuid> [--include <glob,glob>] [--exclude <glob,glob>] [--skip-fluff] [--priority index=priority,…]`

Updates file-selection rules via POST /v1/torrents/{id}/select.
--priority accepts repeated index=priority pairs (skip|low|normal|high) mapped onto the engine’s FilePriority.

`revaer action <uuid> <pause|resume|remove|reannounce|recheck|sequential|rate> [--delete-data] [--enable <bool>] [--download <bps>] [--upload <bps>]`

One-stop entry point for all torrent actions.
sequential toggles sequential downloads via --enable true|false.
rate updates per-torrent bandwidth caps (bps). Provide --download and/or --upload.
remove honours --delete-data.

Event Streaming

`revaer tail [--torrent <id,id>] [--event <kind,kind>] [--state <state,state>] [--resume-file <path>] [--retry-secs <n>]`

Connects to /v1/events using SSE.
Filters match the API query parameters and enforce UUID/event-kind validation before the request is made.
When --resume-file is supplied, the CLI persists the last event ID across reconnects so the stream can resume after transient failures.
--retry-secs controls the backoff between reconnect attempts (default: 5 seconds).

All torrent commands require an API key. The CLI surfaces API problems exactly as the server returns them, including RFC9457 validation errors and rate-limit responses (429 Too Many Requests with retry metadata in the body).

API Documentation

This directory hosts HTTP API specifications, generated OpenAPI documents, and usage guides for the Revaer control plane.

schema/ – Published OpenAPI payloads and supporting artefacts.
guides/ – Scenario-based walkthroughs (bootstrap, hot reload validation, torrent lifecycle).
examples/ – HTTP request/response samples captured from real workflows.

Current Coverage

Setup & configuration – /admin/setup/* and /admin/settings flows with CLI parity.
Orchestration – /admin/torrents (POST/DELETE/GET) for submitting or removing torrents, plus /admin/torrents/{id} for status inspection.
Observability – /v1/events SSE stream (tested for replay/keep-alive) and /metrics Prometheus surface with torrent gauges.

See guides/bootstrap.md for an end-to-end description of the bootstrap lifecycle, background workers, and error handling expectations.

Next Steps

Capture worked examples for torrent status reconciliation (list + selective GET).
Provide troubleshooting recipes for common workflow failures (engine unavailable, filesystem policy rejection).
Expand SSE consumer documentation with incremental backfill strategies.

OpenAPI Reference

Canonical machine-readable description of the Revaer control plane surface.

The generated OpenAPI specification lives alongside the documentation at docs/api/openapi.json. Regenerate it with:

just api-export

Once refreshed, rebuild the documentation (just docs) to publish the updated schema to the static site and LLM manifests. API consumers can download the JSON directly from the deployed documentation site or via the repository.

Architecture Decision Records

ADR documents capture the rationale behind significant technical decisions.

Suggested Workflow

Create a new ADR using the template in docs/adr/template.md.
Give it a sequential identifier (e.g., 001, 002) and a concise title.
Capture context, decision, consequences, and follow-up actions.
Reference ADRs from code comments or docs where the decision applies.

001 – Global Configuration Revisioning

Status: Proposed
Date: 2025-02-23

Context

All runtime configuration must be hot-reloadable across multiple crates.
Consumers need a consistent ordering guarantee for applying changes received via LISTEN/NOTIFY, with a fallback to polling.
We require a DB-native mechanism that can be incremented from triggers without race conditions and that carries across deployments.

Decision

Introduce a singleton settings_revision table with an ever-incrementing revision counter.
Wrap updates to configuration tables (app_profile, engine_profile, fs_policy, auth_api_keys, query_presets) in triggers that:
1. Update settings_revision.revision = revision + 1.
2. Emit NOTIFY revaer_settings_changed, '<table>:<revision>:<op>'.
ConfigService exposes ConfigSnapshot to materialize a consistent view (revision + documents) for the application bootstrap path.
The revision remains monotonic even if polling is used (consumers record the last seen revision and request deltas if they miss notifications).
Mutation APIs validate payloads server-side, applying field-level type checks and respecting app_profile.immutable_keys. Violations surface as structured errors with section/field metadata, preventing silent drift.

Consequences

Multi-table updates executed inside a transaction surface as a single revision bump, preserving ordering for consumers.
LISTEN subscribers that drop their connection can reconcile by reloading settings_revision and querying deltas > last_seen_revision.
Trigger-level logic slightly increases write cost but keeps business code free of manual revision management.

Follow-up

Implement apply_changeset to write history rows with the associated revision.
Add integration tests that exercise transactionally updating multiple tables and verifying a single revision increment.

002 – Setup Token Lifecycle & Secrets Bootstrap

Status: Proposed
Date: 2025-02-23

Context

Initial deployments must boot in a locked-down "Setup Mode" where only a one-time token grants access to the setup API.
Tokens should be observable/auditable, expire automatically, and support regeneration without requiring an application restart.
A follow-on requirement is to collect an encryption passphrase or server-side key for pgcrypto-backed secrets before exiting Setup Mode.

Decision

Store tokens in the setup_tokens table with token_hash, issued_at, expires_at, consumed_at, and issued_by.
Enforce at most one active token via a partial unique index on rows where consumed_at IS NULL.
ConfigService will:
- Generate tokens using cryptographically secure randomness.
- Persist only a hashed representation (argon2id) along with metadata.
- Emit history entries and NOTIFY events on token creation/consumption.
The CLI/API surfaces token issuance and completion flows; the process prints the token to stdout only at generation time.
During completion, the caller must supply the encryption materials (passphrase or reference to pgcrypto role). The handler verifies secrets are persisted before flipping app_profile.mode to active.

Consequences

Operators can recover by issuing a new token if the previous one expires without restarting the service.
Tokens are auditable; failed attempts can be recorded against the hashed token id (future enhancement).
The bootstrap path ensures secrets exist before runtime modules that require them start, preventing a partially configured system.

Follow-up

Implement argon2id hashing helpers and audit logging in revaer-config.
Define the CLI workflow (revaer-cli setup) that wraps token issuance and completion for headless environments.
Add problem detail responses for expired/consumed tokens in the API.

003 – Libtorrent Session Runner Architecture

Status: Accepted
Date: 2025-10-16

Context

The current revaer-torrent-libt crate is a stub that simulates torrent actions without touching libtorrent, preventing real downloads, fast-resume, or alert handling.
Phase One requires a production-grade engine: a single async task must own the libtorrent session, persist fast-resume data/selection state, debounce high-volume alerts, and surface health to the event bus.
The engine must enforce rate limits and selections within libtorrent, react within two seconds of configuration changes, and survive restarts by restoring torrents from resume_dir.

Decision

Introduce a dedicated SessionWorker spawned by LibtorrentEngine::new. It owns the libtorrent Session, receives EngineCommand messages, and emits EngineEvents via an internal channel that feeds the shared EventBus.
Wrap the libtorrent FFI in a thin adapter trait (LibtSession) to encapsulate blocking calls (add_torrent, pause, set_sequential, apply_rate_limits, file_priorities, alert polling). The real implementation uses tokio::task::spawn_blocking to call into C++ safely.
Add a FastResumeStore service that reads/writes .fastresume blobs plus JSON metadata (selection, priorities, download directory, sequential flag) inside resume_dir. On startup the worker loads the store, attempts to match existing handles, and emits reconciliation events if the stored state diverges.
Run an AlertPump loop that waits on libtorrent alerts_waitnotify, drains all alerts, and funnels them through an AlertTranslator that converts them into domain EngineEvents (FilesDiscovered, Progress, StateChanged, Completed, Error). A ProgressCoalescer throttles updates to 10 Hz per torrent.
Integrate health tracking: fatal session errors transition the engine into a degraded state and emit both HealthChanged and per-torrent Error events. The worker attempts limited restarts with exponential back-off before marking the engine unhealthy.
Rate limit updates from EngineCommand::UpdateLimits and configuration watcher updates call into libtorrent immediately; a watchdog verifies application within two seconds and logs warnings if the session reports stale caps.

Consequences

The engine crate gains clear separation between command handling, libtorrent FFI, alert translation, and persistence, making it easier to test components in isolation using mock LibtSession implementations.
Persisted state in resume_dir enables crash-restart flows to resume downloads, leveraging libtorrent fastresume and our own selection metadata.
Debouncing progress events reduces SSE pressure while preserving responsiveness; coalescing happens before events hit the shared bus.
Health reporting integrates with the existing telemetry crate, providing operators visibility into session failures or missing dependencies (e.g., absent resume directory).

Follow-up

Maintain regression coverage for the libtorrent feature path, ensuring fast-resume reconciliation and guard-rail health events remain stable.
Track upstream libtorrent upgrades and refresh the operator documentation whenever the resume layout or dependency expectations shift.

004 – Phase One Delivery Track

Status: Accepted
Date: 2025-10-17

Motivation

Phase One bundles the remaining work required to transition Revaer from the current stubs into a production-ready torrent orchestration platform. This record captures the implementation notes, decisions, and verification evidence for each workstream item enumerated in docs/phase-one-roadmap.md.

Design Notes

Follow the library-first structure outlined in AGENT.md with crate-specific modules for configuration, engine integration, filesystem operations, public API, CLI, security, and packaging.
Apply tight configuration validation and hot-reload behaviour to guarantee that throttle and policy updates propagate within two seconds.
Emit guard-rail telemetry whenever global throttles are disabled, driven to zero, or configured above the 5 Gbps warning threshold so operators can react quickly.
Replace the stub libtorrent adapter with a session worker that owns state, persists fast-resume metadata, and surfaces alert-driven events with bounded fan-out.
Persist resume metadata and fastresume payloads via FastResumeStore, reconcile on startup, and emit SelectionReconciled events plus health degradations when store contents diverge or writes fail.
Build deterministic include/exclude rule evaluation and an idempotent FsOps pipeline anchored by .revaer.meta.
Expose a consistent Problem+JSON contract across HTTP and CLI surfaces, including pagination and SSE replay support.
Enforce observability invariants: structured tracing with context propagation, bounded rate limits, Prometheus metrics, and degraded health signalling when dependencies fail.
Ensure every workflow is reproducible via just targets and validated in CI, with container packaging aligned to the non-root, read-only expectations.
Follow the canonical just recipe surface (fmt, lint, test, ci, etc.). Coloned variants are mapped to hyphenated recipe names (fmt-fix, build-rel, api-export) because just 1.43.0 rejects colons in recipe identifiers without unstable modules; the semantics remain identical.

Test Coverage Summary

just ci serves as the baseline verification target. Each workstream delivers focused unit tests, integration coverage, and feature-flagged live tests (for libtorrent, Postgres, FsOps).
Coverage gates are enforced via cargo llvm-cov with --fail-under 80 across library crates.
Integration suites will rely on testcontainers (Postgres, libtorrent) and workspace-specific fixtures for FsOps pipelines and API/CLI flows, including the configuration watcher hot-reload test and new libtorrent-feature tests for resume restoration and fastresume persistence.

Outcome

All public surfaces now enforce API-key authentication with token-bucket rate limiting, 429 Problem+JSON responses, and telemetry counters exported via Prometheus and /health/full.
SSE endpoints honour the same auth and Last-Event-ID semantics, with CLI resume support persisting state between reconnects.
The CLI propagates x-request-id, standardises exit codes (0 success, 2 validation, 3 runtime), and emits optional telemetry events to REVAER_TELEMETRY_ENDPOINT.
A release-ready Docker image (Dockerfile) packages the API binary and documentation on a non-root, read-only-friendly runtime with health checks and volume mounts for config/data.
CI now publishes release artefacts (revaer-app, OpenAPI) and runs MSRV and container security jobs via just targets; binaries are checksummed alongside provenance metadata.
Documentation additions cover FsOps design, API/CLI contracts, security posture, operator runbook, telemetry reference, and the phase-one release checklist.

Observability Updates

Telemetry enhancements include structured logs for setup token issuance/consumption, loopback enforcement failures, configuration watcher updates, rate-limit guard-rail decisions, and resume store degradation/recovery.
Metrics will expand to track HTTP request outcomes, SSE fan-out, event queue depth, torrent throughput, FsOps step durations, and health degradation counts.
/health/full will report engine, FsOps, and database readiness with latency measurements and revision hashes, mirrored by CLI status commands.

Risk & Rollback Plan

Maintain incremental commits gated by just ci to isolate regressions. Any new dependency introductions require explicit justification and fallbacks documented here.
Where feature flags guard libtorrent integration, provide mockable interfaces so tests can fall back to stub implementations if the environment lacks native bindings.
Persist fast-resume metadata and .revaer.meta files so failed deployments can roll back without corrupting state; ensure migrations remain additive.

Dependency Rationale

No new dependencies have been added yet. Future additions (e.g., libtorrent bindings, glob evaluators, archive tools) must include:

Why the crate/tool is necessary.
Alternatives considered (including bespoke implementations) and why they were rejected.
Security and maintenance assessment (license compatibility, release cadence).

005 – FsOps Pipeline Hardening

Status: Accepted
Date: 2025-10-17

Context

Phase One promotes filesystem post-processing from a best-effort helper to a first-class workflow with explicit health semantics.
The orchestrator must ensure every completed torrent flows through a deterministic FsOps state machine, emitting structured telemetry and reconciling mismatches with persisted metadata.
Operators require visibility into FsOps latency, failures, and guard-rail breaches (e.g., missing extraction tools, permission errors) via /health/full, Prometheus, and the shared EventBus.

Decision

FsOps responsibilities live inside revaer-fsops, invoked by the orchestrator (TorrentOrchestrator::apply_fsops) whenever a Completed event surfaces.
Each pipeline step (extract, par2, move, cleanup) records start/completion/failure events and increments Prometheus counters via Metrics::inc_fsops_step.
Metadata is persisted alongside .revaer.meta to reconcile selection overrides and resume directories across restarts; mismatches trigger SelectionReconciled events plus guard-rail telemetry.
Health degradation is published when FsOps detects latency guard rails, missing tools, or unrecoverable IO errors; recovery clears the fsops component from the degrade set.

Consequences

FsOps execution becomes observable and retry-friendly, enabling operator runbooks to diagnose stuck jobs with concrete metrics and events.
Pipeline regressions now fail CI thanks to targeted unit/integration tests under revaer-fsops and orchestrator-level tests driving the shared event bus.
The orchestration layer remains single-owner of FsOps invocation, simplifying future extensions (e.g., checksum verification, media tagging) without leaking concerns into the API.

Verification

just test exercises FsOps unit cases, while orchestrator integration tests validate event emission, degradation flows, and metadata reconciliation.
/health/full and Prometheus snapshots display FsOps metrics during the runbook, confirming latency guard rails and failure counters behave as expected.

006 – Unified API & CLI Contract

Status: Accepted
Date: 2025-10-17

Context

Phase One requires parity between the public HTTP interface and the administrative CLI so operators can automate without reverse engineering payloads.
Prior iterations lacked shared DTOs, consistent Problem+JSON responses, and stable pagination/SSE semantics across API and CLI.
New rate limiting and telemetry features must surface identically on both surfaces to satisfy observability and security requirements.

Decision

Shared request/response models live in revaer-api::models and are re-exported to the CLI, ensuring identical JSON encoding/decoding paths.
All routes return RFC9457 Problem+JSON payloads on validation/runtime errors, including invalid_params pointers for user-correctable mistakes; the CLI pretty-prints these problems and maps validation to exit code 2.
Cursor pagination, filter semantics, and SSE replay (Last-Event-ID) are implemented once in the API and exercised by dedicated CLI commands (ls, status, tail).
The CLI propagates x-request-id headers, emits structured telemetry events to REVAER_TELEMETRY_ENDPOINT, and redacts secrets in logs; runtime failures exit with code 3 to distinguish from validation issues.

Consequences

Changes to the API contract require updates in a single module (revaer-api::models), reducing the risk of CLI drift.
Downstream tooling can rely on deterministic exit codes and Problem+JSON payloads, simplifying automation.
Telemetry pipelines receive consistent trace identifiers regardless of whether requests originate from the CLI or other clients.

Verification

Integration tests cover pagination, filter validation, SSE replay, and CLI HTTP interactions via httpmock, ensuring behaviour remains in lockstep.
just api-export regenerates docs/api/openapi.json, and CI asserts the CLI uses the shared DTOs by compiling with the workspace feature set.

007 – API Key Security & Rate Limiting

Status: Accepted
Date: 2025-10-17

Context

API keys were previously verified but not throttled, allowing abusive clients to starve the control plane and masking guard-rail violations.
Operators need guard-rail metrics, health events, and documentation describing key lifecycle, rate limits, and rotation workflows.
CLI tooling must respect the same security posture, including masking secrets and surfacing authentication failures with actionable errors.

Decision

Each API key stores a JSON rate limit (burst, per_seconds) validated by ConfigService; token-bucket state is maintained per key inside the API layer.
Requests exceeding the configured budget return 429 Too Many Requests Problem+JSON responses, increment Prometheus counters (api_rate_limit_throttled_total), and emit HealthChanged events when guard rails (e.g., unlimited keys) are breached.
CLI authentication mandates key_id:secret, redacts secrets in logs, and propagates x-request-id so operators can correlate requests with server-side traces.
CI enforces MSRV and Docker security gates to ensure build artefacts respect the security baseline.

Consequences

Compromised or runaway keys are contained, preventing control-plane denial-of-service and providing clear telemetry for incident response.
Documentation now includes API key rotation steps, rate-limit expectations, and remediation guidance for guard-rail events.
The API and CLI remain aligned by sharing auth context types and telemetry primitives.

Verification

Unit tests cover rate-limit parsing and token-bucket behaviour; integration tests assert 429 responses and CLI exit codes.
/health/full exposes rate-limit metrics, and the Docker image runs as a non-root user with health checks hitting the authenticated endpoints.

008 – Phase One Remaining Delivery (Task Record)

Status: In Progress
Date: 2025-10-17

Motivation

Implement the outstanding Phase One scope: per-key rate limiting, CLI parity (telemetry, exit codes), packaging, documentation, and CI gates required by docs/phase-one-remaining-spec.md and AGENT.md.

Design Notes

Introduced ConfigService::authenticate_api_key returning rate-limit metadata, validated JSON payloads, and persisted canonical token-bucket configuration.
Added ApiState::enforce_rate_limit with per-key token buckets, guard-rail health publication, Prometheus counters, and Problem+JSON 429 responses.
CLI now builds reqwest clients with default x-request-id, standardises exit codes (0/2/3), and emits optional telemetry events when REVAER_TELEMETRY_ENDPOINT is set.
Created a multi-stage Dockerfile (non-root runtime, healthcheck, docs bundling) with just recipes for building and scanning.
Expanded CI with release artefact, Docker, and MSRV jobs that call the new just targets.

Test Coverage Summary

Added unit tests for rate-limit parsing and token-bucket behaviour (revaer-config, revaer-api).
Existing integration suites exercise Problem+JSON responses, SSE replay, and CLI HTTP interactions.
Runbook (docs/runbook.md) supports manual verification of FsOps, rate limits, and guard rails.

Observability Updates

Prometheus now exposes api_rate_limit_throttled_total; /health/full includes the counter and degrades when guard rails fire.
CLI telemetry emits JSON events (command, outcome, trace id, exit code) to configurable endpoints.
Documentation adds telemetry reference, operations guide, and release checklist for operators.

Risk & Rollback

Rate-limit enforcement is isolated to require_api_key; rollback by removing enforce_rate_limit call if unexpected throttles occur.
Docker image/builder changes are gated via just docker-build and just docker-scan; revert by restoring previous absence of Docker packaging.
CI additions run after core jobs and can be disabled via workflow changes if they fail unexpectedly.

Dependency Rationale

No new Rust crates were introduced. Docker scanning uses trivy via CI and manual recipe; it is optional for local development.

Status: {Proposed|Accepted|Superseded}
Date: {YYYY-MM-DD}
Context:
- What problem are we solving?
- What constraints or forces shape the decision?
Decision:
- Summary of the choice made.
- Alternatives considered.
Consequences:
- Positive outcomes.
- Risks or trade-offs.
Follow-up:
- Implementation tasks.
- Review checkpoints.

Keyboard shortcuts

Revaer Documentation