feat(api)!: memory single chain — async MemoryService, strict eval closure

Route all memory ingest/retrieve/enrichment/compaction through async MemoryService. Remove legacy sync memory implementations (ingest/retrieve/compaction); Celery and memoir Phase2 call asyncio.run into MemoryService-backed helpers. Memoir Phase1 batch ingest uses MemoryService.ingest_transcripts_batch; drop chapters. evidence_bundle_json mirror (Alembic 0015). Evaluation uses snapshot/link-only bundles; raise EvidenceClosureMissing instead of partial/fallback lineage tiers. Split memoir state into NarrativeCoverageState and InterviewControlState; delete the _interview_meta_store adapter layer. Remove rolling-query and recent-fact fallback settings from config and evidence assembly. Update judges, docs, tests, and PlaygroundPage alignment. Made-with: Cursor
2026-04-30 14:11:46 +08:00
parent ac436b87a2
commit 71fbd39e32
53 changed files with 953 additions and 2448 deletions
--- a/api/docs/traceable-memoir-lineage.md
+++ b/api/docs/traceable-memoir-lineage.md
@@ -1,41 +1,20 @@
-# 回忆录可追溯证据（产品与内评口径）
+# 回忆录可追溯证据

-本文与 PM、标注、工程共用：**旧库数据不要求为评测专门 backfill**；新写入走统一闭包与快照表。口径不清会导致反复对齐成本，变更 tier 规则时请同步改 `EvalTraceService._chapter_closure_tier` / Story 侧等价逻辑与本文。
+Library artifact 评测只接受严格证据闭包。

-## lineage_tier：strict / partial / fallback
+## 严格闭包

-| 档位 | 含义（章节） | 含义（故事，概要） |
-|------|----------------|-------------------|
-| **strict** | 既有可解析的访谈 **segment**（可绑定 transcript），又有从对应会话闭包得到的 **结构化记忆**（chunk / fact / timeline / summary 等任一非空）。 | Story 上以 `StoryEvidenceLink` 等为主链解析出 segment + 结构化记忆均存在。 |
-| **partial** | 有可解析的 **segment / transcript 链**，但只有结构化记忆为空，或仅有结构化记忆而 **无** 可绑定的 segment（**与 PM/标注对齐：仅有结构化、无 transcript = partial**）。 | 能从章节 `source_segments` 等推导出一侧证据但闭包不完整。 |
-| **fallback** | 无法从 artifact 构建足够闭包（例如无 segment 且无法走库内链路），评测侧 **显式** 降级为「最近若干会话」等粗粒度 transcript；须在结果 `notes` / `evidence_trace` 中可见，避免静默当 strict 用。 |
+- Chapter：必须有 `chapters.current_evidence_snapshot_id` 指向的 `chapter_evidence_snapshots` 行，且同时包含可绑定 transcript 的 `segment_ids` 与至少一类 memory evidence id。
+- Story：必须有 `StoryEvidenceLink`，且能绑定到 transcript segments。
+- 缺闭包时，`EvalTraceService` 抛 `EvidenceClosureMissing`；不再拼接最近会话全文或从旧 JSON 镜像补齐。

-**说明**：`partial` 不是「质量差」，而是「血缘不完整仍可评审」；`fallback` 是「链路断裂时的保守降级」，评审 prompt 与 gate 解读需区别对待。
+## Phase C 表

-## 自动化评测：synthetic memoir vs library artifact（分表心理模型）
+- `chapter_evidence_snapshots`：一行对应一次物化闭包，`version_no` 递增。
+- `chapter_evidence_links`：按当前快照整批替换章节侧结构化 evidence id。
+- `chapters.current_evidence_snapshot_id` 是章节评测唯一入口。

-同一次 `eval_run` 里可能同时存在两类「回忆录」分数，语义不同，**勿混为一谈**：
+## Synthetic vs Library

- **Synthetic（replay 合成短文）**：由 case 的 replay 对话现场拼出的短 markdown，证据闭包仅为 **重放 transcript**，不绑定用户库里的 memory chunk / fact / timeline / summary。`judge_meta.synthetic_memoir_lineage_tier` 等为 `replay_transcript_only` 一类标记。
- **Library（库内章节 / 故事）**：真实 `Chapter` / `Story` artifact，使用 `EvalTraceService` 组装的 evidence bundle（含 `lineage_tier`、`evidence_trace`）。
-
-聚合规则见 `judge_bundle_json.judge_meta.memoir_aggregate_rule`（例如合成与 library 均有分时的加权方式）。对 PM 汇报时请分项展示，避免只报一个「回忆录分」。
-
-## 内评 JSON：`evidence_trace` 是否够用？
-
-当前 `evidence_trace` 为 `ChapterEvidenceBundle` / `StoryEvidenceBundle` 的 **完整序列化**：含 `segment_ids`、`conversation_ids`、各类 memory id、`lineage_tier`、`notes`、`augmented_with_chapter_context` 等。**一般内审计够**：可按 id 去 DB 或日志反查。
-
-若需 **按 artifact 类型展开为可点击深链 / 批量导出**，属于体验增强，可单独排期（eval-web 已支持章节级折叠展示 id 列表）。
-
-## Phase C：`chapter_evidence_snapshots` 与 `chapter_evidence_links`
-
- **快照表**：一行对应一次物化闭包（`version_no` 递增）；`chapters.current_evidence_snapshot_id` 指向当前生效行。
- **链接表**：与 `StoryEvidenceLink` 对称，按快照刷新时 **整批替换** 章节侧结构化记忆 id，便于审计与扩展。
- **评测消费顺序**：`current_evidence_snapshot`（表）→ `evidence_bundle_json`（JSON 镜像，兼容）→ 现场用 `source_segments` 计算（与 live 不一致时 `notes` 提示）。
- **旧数据**：可不迁；新流水线写入会同时更新表与 JSON 镜像。
-
-## 技术债（backlog，不阻塞发版）
-
-1. **统一闭包计算**：生产快照与 `EvalTraceService` 已共用 `build_chapter_evidence_closure_payload_sync`；Story / 其它路径若仍有重复推导，应收敛到同一入口，避免双份实现漂移。
-2. **扩展 memory trace**：除当前访谈检索外，其它入口若向模型喂 memory，评估是否同样写入 `memory_retrieval_trace_json`（或等价 trace），以便 partial / strict 判定与事后审计一致。
-3. **canonical 与 `source_segments` union 冲突**：若线上冲突案例增多，再评估独立快照表外的「版本级 link」或更强约束；当前 Phase C 已降低仅依赖单列 JSON 的风险。
+- Synthetic replay 只评估重放 transcript，不绑定用户库 memory。
+- Library chapter/story 使用严格闭包；缺闭包即不可评测。