diff --git a/.cursor/plans/story_route_append_context.plan.md b/.cursor/plans/story_route_append_context.plan.md new file mode 100644 index 0000000..e126058 --- /dev/null +++ b/.cursor/plans/story_route_append_context.plan.md @@ -0,0 +1,128 @@ +--- +name: Story route append context +overview: 候选载荷增胖(summary 优先 + 可配置预算)+ category-aware prompt 纠偏 + 最小测试集;不扩大路由体系、不改 schema、不接离线合并。 +todos: + - id: config-budget + content: Add Settings — story_route_candidate_body_max_chars, total_max, head/tail, summary_min_len (optional) + status: completed + - id: payload-builder + content: build_route_candidate_rows — fixed sort, summary-first body rules, total budget downgrade to index rows + status: completed + - id: prompts-merge-bias + content: get_story_route_prompt + get_story_batch_plan — two-layer criteria, category blocks, remove default-to-new_story + status: completed + - id: wire-agent + content: StoryRouteAgent decide/plan_batch use new builder + prompts; validate_story_batch_plan unchanged; no pipeline signature change + status: completed + - id: tests + content: Builder tests + prompt contains + beliefs append smoke + career new_story smoke + test_story_route_oral_invariant + status: completed +isProject: false +--- + +# Story Route:候选上下文增胖 + category-aware append 纠偏 + +## Ticket / PR 一句话 + +用更富的候选 JSON(**summary 优先**、再按需补正文)与 **按类目纠偏** 的提示词,修复 `StoryRouteAgent` 在强主题类目下过度 `new_story`;预算 **Settings 化**,默认值 **保守**;**不改** memoir 流水线签名与 DB schema。 + +## 根因(代码事实) + +- `[api/app/agents/memoir/story_route_agent.py](api/app/agents/memoir/story_route_agent.py)`:`preview_chars=220`,同类目下多条短感悟几乎不可区分。 +- `[api/app/agents/memoir/prompts.py](api/app/agents/memoir/prompts.py)`:**「若无法自信匹配某一候选,选 new_story」** → 与「主题容器逐步加厚」产品预期相反。 +- `Story.summary` 列存在,**路由未用**;仅截 `canonical_markdown`。 + +## 本轮 scope(做) + + +| 交付物 | 说明 | +| -------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | +| `[story_route_agent.py](api/app/agents/memoir/story_route_agent.py)` | 新增 candidate **payload builder**,`_build_candidate_json` 改为走 builder | +| `[prompts.py](api/app/agents/memoir/prompts.py)` | `get_story_route_prompt`、`get_story_batch_plan_prompt` 纠偏 + **两层决策标准** + **三类 category 规则** | +| `[config.py](api/app/core/config.py)` | **3–4 个**预算相关字段(见下) | +| 测试 | builder 单测、prompt 片段断言、**2 条类目行为样例**、`[test_story_route_oral_invariant.py](api/tests/test_story_route_oral_invariant.py)` 回归 | + + +## 本轮 scope(不做) + +- 离线 / admin 合并历史垃圾 story +- schema 变更、二次 merge worker、family 细分子策略扩写、时间跨度阈值 +- 扩大路由体系(多模型级联、新端口签名等) + +## 1. 候选行结构(summary 主角) + +每条候选字典 **必带**:`id`、`title`、`char_count`、`version_count`、`updated_at`(ISO)、`linked_chapters`(保持现有拼接逻辑)。 + +**内容优先级,严格顺序:** + +1. 若 `Story.summary` **非空且达到 `summary` 最小长度阈值**(可配置,如 ≥30 字或复用现有惯例),则带 `summary`,**本轮不带** `body_for_route`(避免长正文冲淡摘要)。 +2. 若 summary **缺失或过短**,再构造 `body_for_route`:**短正文**尽量全文;超长用 **head + tail**(中间省略说明),长度受 `story_route_long_body_head_chars` / `tail` 与单篇 cap 约束。 +3. 超 **总预算** 时,将该条 **降级为索引行**(仅 `id`、`title`、`char_count`、极短 `preview`,提示中说明索引项优先匹配带 `summary`/`body` 的条目)。 + +**初版默认值(保守,可线上调大):** + +- `story_route_candidate_body_max_chars`:**1200–2000**(落地取单值如 **1600**) +- `story_route_candidate_total_max_chars`:**12000–18000**(如 **16000**) +- head/tail:按现有「长文才切」思路配一对合理默认(如各 **600–800**) + +## 2. 排序规则(写死,tie-break) + +在进入 builder **之前**对 `candidate_stories` 排序(同 stage 列表): + +`has_summary(desc) → updated_at(desc) → version_count(desc) → char_count(desc) → id(asc)` + +其中 `has_summary`:summary strip 后长度 ≥ 配置的 `summary` 最小长度。 + +## 3. 提示词 + +### 3.1 去掉保守偏置 + +删除「拿不准就 `new_story`」;改为:**先看是否与某候选在主题/事件层级上可合并,再决定**。 + +### 3.2 两层决策标准(显式写在 prompt 里) + +- **主题连续性信号**:价值观、关系模式、长期总结、同一反思维度。 +- **事件切换信号**:新人物组合、新地点、新时间段、新事件因果链。 + +指引:`beliefs` / `summary` **更看主题连续性**;`career_*` / `childhood` / `education` **更看事件链**。 + +### 3.3 类目规则(第一版只三件) + +- `**beliefs`、`summary`**:**强容器** → 多条短感悟、同一句式起笔、同价值维度 → **强烈倾向 `append_story`**(指向最匹配的一条候选 id)。 +- `**career_*`、`childhood`、`education**`:**强 episode** → 明确不同事件链可 `new_story`;同一经历续问可 append。 +- `**family`**:**中性** → 一句话:原则/关系反思倾向 append;**明确的新事件链**可 new;**不**展开长列表例外。 + +`get_story_batch_plan_prompt` 与 `get_story_route_prompt` **对齐**上述规则。 + +## 4. 接线 + +- `[story_pipeline_sync.py](api/app/features/memoir/story_pipeline_sync.py)` **不改** `StoryRouteAgent` 调用签名。 +- `validate_story_batch_plan`、Pydantic 模型 **不变**。 + +## 5. 测试(最小集 + 2 条行为) + + +| 测试 | 目的 | +| -------- | ----------------------------------------------------------------------------------------------------------------------------- | +| Builder | summary 优先不带 body;summary 短则带 body;总预算降级;**排序**稳定 | +| Prompt | 类目块、两层标准、`beliefs`/`family` 中性句等 **包含断言** | +| **行为 A** | mock LLM:`beliefs` + 两则短感悟口述 + 已有 1 条 story → 期望 **append**(或断言传给 mock 的 payload 含足够 summary/body 且 prompt 强调强容器;实现时二选一并写死断言) | +| **行为 B** | mock LLM:`career_achievement`(或 `childhood`)+ 两起明确不同事件 → 允许/期望 **new_story** | +| 回归 | `[test_story_route_oral_invariant.py](api/tests/test_story_route_oral_invariant.py)`:路由输入仍 **不含 evidence** | + + +行为测试若不便绑定真实 LLM,采用 **mock `invoke_json_object` / StoryRouteAgent** 固定返回或 **断言 prompt + candidate JSON 形状**,与现有 `[test_story_route_oral_invariant.py](api/tests/test_story_route_oral_invariant.py)` 风格一致。 + +## 6. 风险与验收 + +- **Token**:默认保守;观察 staging 日志 `route_decision` / `is_append` 再调 `Settings`。 +- **过度合并**:靠 episode 类与「事件切换信号」段落缓解。 + +## 实施顺序 + +1. `config.py` 字段 +2. Builder + builder 单测 + 排序单测 +3. Prompt 改造 + prompt 断言 +4. `StoryRouteAgent` 接线 +5. 行为 A/B + oral invariant + 相关 memoir 测试 + diff --git a/api/app/agents/memoir/prompts.py b/api/app/agents/memoir/prompts.py index fabd165..474d26a 100644 --- a/api/app/agents/memoir/prompts.py +++ b/api/app/agents/memoir/prompts.py @@ -436,6 +436,40 @@ def get_narrative_merge_json_prompt( """ +def story_route_merge_hint_for_category(chapter_category: str) -> str: + """按章节类目的 append/new 倾向(与 StoryRouteAgent 路由提示共用)。""" + cc = (chapter_category or "").strip() + if cc in ("beliefs", "summary"): + return ( + "### 本章类别路由倾向(强主题容器)\n" + "- 多条短感悟、同一价值维度、同一总结脉络的补充 → **优先 append_story**," + "选最匹配的一条候选 id。\n" + "- 仅在用户明确讲述**与所有候选主题明显不相关**、且可独立成篇的长经历时,才用 new_story。" + ) + if cc == "family": + return ( + "### 本章类别路由倾向(家庭)\n" + "- 原则性反思、关系模式、相处之道的补充 → **倾向 append_story**。\n" + "- **明确的新事件链**(新场景、新时间线、不同人物组合的新经历)→ 可 new_story。" + ) + if cc in ( + "childhood", + "education", + "career_early", + "career_achievement", + "career_challenge", + ): + return ( + "### 本章类别路由倾向(经历叙事)\n" + "- 以具体事件链为主:**不同事件 / 时期 / 地点** → 可 new_story。\n" + "- 明显是**同一段经历的续叙、补充细节** → append_story。" + ) + return ( + "### 本章类别路由倾向(一般)\n" + "- 同时参考「主题连续性」与「事件切换」两类信号做判断。" + ) + + def get_story_route_prompt( *, chapter_category: str, @@ -448,15 +482,24 @@ def get_story_route_prompt( 「故事」= 可独立讲述的一段人生经历;进入本步的批次已归入具体 chapter category (含模型返回 none 或零散档案启发式时映射的 summary)。 """ - return f"""你是回忆录编辑助手。根据本批用户口述与候选故事列表,决定: -- append_story:内容明显延续、补充某一已有故事的主题与时间线,且能对应到具体 candidate id -- new_story:新话题、新人生阶段片段,或与所有候选故事都不够贴合 + merge_hint = story_route_merge_hint_for_category(chapter_category) + return f"""你是回忆录编辑助手。根据本批用户口述与【候选故事】决定 append_story 或 new_story。 **JSON 输出**:接口已启用 `response_format=json_object`,只输出下面 schema 的一个合法 JSON 对象,不要 markdown。 -「故事」在此指:**可独立讲述的一段人生经历**——单一主题或同一事件链;不要假设本批里包含多个互不相关的故事(多段由系统其它步骤处理)。 +## 两层决策标准(必须先在心里过一遍) +1. **主题连续性信号**:价值观、关系模式、长期总结、同一反思维度;口述是否像在**同一主题容器**里加厚? +2. **事件切换信号**:是否出现**新人物组合、新地点、新时间段、新事件因果链**,与候选正文明显是**另一段经历**? -**路由边界(必须遵守)**:仅根据下方「本批口述合并文本」判断 new_story 与 append_story;不得将系统检索摘要、记忆摘录、图谱事实或其它非用户口述材料当作本批口述内容来匹配候选故事。 +- 类别 **beliefs / summary**:更重主题连续性;除非事件切换信号极强,否则倾向 append。 +- 类别 **career_* / childhood / education**:更重事件链;不同事件可 new,同一经历续聊则 append。 +- 类别 **family**:两类信号兼顾——原则/关系反思倾向 append;明确新事件链可 new。 + +{merge_hint} + +**路由边界(必须遵守)**:仅根据下方「本批口述合并文本」判断;不得将系统检索摘要、记忆摘录等当作本批口述内容来匹配候选。 + +**候选故事说明**:列表项可能含 `summary` 或 `body_for_route`(正文摘要);仅含 `preview` 者为索引项,信息不全。**append 时优先匹配带 summary 或 body 的条目**;索引项仅作候选 id 备忘。 当前章节(写作容器): - category: {chapter_category} @@ -465,7 +508,7 @@ def get_story_route_prompt( 【本批口述合并文本】 {batch_transcript} -【候选故事】(仅允许在 append 时选择其中的 id;id 必须原样复制) +【候选故事】(append 时 target_story_id 必须来自下列 id,且原样复制) {candidate_stories_json} ## 输出 JSON(仅此一个对象,不要 markdown) @@ -476,7 +519,8 @@ def get_story_route_prompt( }} 规则: -- 若无法自信匹配某一候选,选 new_story +- **不要**只因「不太确定」就选 new_story;在主题可并入某一候选时应 append_story。 +- 仅当口述与**所有**候选在两层标准下都明显不兼容时,才选 new_story。 """ @@ -488,17 +532,28 @@ def get_story_batch_plan_prompt( candidate_stories_json: str, ) -> str: """同一章节类别下多 segment:划分为若干写入单元(每单元 new 或 append)。输出严格 JSON。""" + merge_hint = story_route_merge_hint_for_category(chapter_category) return f"""你是回忆录编辑助手。下面同一章节类别下有一批**按时间顺序**的用户口述片段(每段有 id 与文本)。 **JSON 输出**:接口已启用 `response_format=json_object`,只输出下面 schema 的一个合法 JSON 对象,不要 markdown。 +## 两层决策标准(每一块都要应用) +1. **主题连续性信号**:价值观、关系模式、长期总结、同一反思维度。 +2. **事件切换信号**:新人物组合、新地点、新时间段、新事件因果链。 + +各类别倾向与单段路由一致:beliefs/summary 重主题连续性;career/childhood/education 重事件链;family 兼顾。 + +{merge_hint} + ## 「故事」定义(必须遵守) -一段「故事」= **可独立讲述的一段人生经历**:单一主题或同一事件链,能单独成篇。若话题切换、时间线跳到另一件事、人物/主线明显变化,应作为**新的故事**(new_story),而不是塞进同一段 append。 +一段「故事」= **可独立讲述的一段人生经历**。**同一主题容器内的连续口述**应并入同一块 append,而不是切碎成多个 new_story。 ## 任务 -将本批 segment **划分为连续若干块**(每块包含至少一个 segment,顺序不能打乱;每个 segment 必须恰好属于一块)。对每一块决定: -- **append_story**:内容明显延续、补充**某一已有候选故事**的主题与时间线,且能对应到具体 candidate id -- **new_story**:新话题、与所有候选故事都不够贴合、或应独立成篇的片段 +将本批 segment **划分为连续若干块**(每块至少一个 segment,顺序不能打乱;每个 segment 必须恰好属于一块)。对每一块决定: +- **append_story**:与某一候选在两层标准下可合并,且能对应到具体 candidate id +- **new_story**:该块与**所有**候选都明显不兼容,或确认为独立新经历 + +**候选故事说明**:条目可能含 `summary`/`body_for_route`;仅 `preview` 者为索引项。**优先用带摘要/正文的条目做 append 目标**。 当前章节(写作容器): - category: {chapter_category} @@ -507,7 +562,7 @@ def get_story_batch_plan_prompt( 【本批口述片段】(JSON 数组,顺序即口述顺序) {segments_json} -【候选故事】(仅允许在 append 时选择其中的 id;id 必须原样复制) +【候选故事】(append 时 target_story_id 必须来自下列 id,且原样复制) {candidate_stories_json} ## 输出 JSON(仅此一个对象,不要 markdown) @@ -524,7 +579,7 @@ def get_story_batch_plan_prompt( 规则: - `units` 中所有 `segment_ids` 拼接后,必须**不重不漏**地覆盖本批全部 id,且顺序与【本批口述片段】数组一致 -- 若无法自信匹配某一候选,对该块选 new_story +- **不要**仅因不确定就对整块选 new_story;能并入候选时应 append_story """ diff --git a/api/app/agents/memoir/story_route_agent.py b/api/app/agents/memoir/story_route_agent.py index a7aadb2..7bfc013 100644 --- a/api/app/agents/memoir/story_route_agent.py +++ b/api/app/agents/memoir/story_route_agent.py @@ -13,6 +13,8 @@ from app.agents.memoir.prompts import ( get_story_batch_plan_prompt, get_story_route_prompt, ) +from app.agents.memoir.story_route_payload import build_route_candidate_json +from app.core.config import settings from app.core.langchain_llm import invoke_json_object from app.core.logging import get_logger from app.features.story.models import Story @@ -63,40 +65,6 @@ class StoryRouteDecision(BaseModel): return str(v) -def _build_candidate_json( - stories: list[Story], - *, - preview_chars: int = 220, - story_meta: dict[str, dict[str, int]] | None = None, -) -> str: - """story_meta: story_id -> { char_count, version_count },供路由感知篇幅与版本数。""" - rows: list[dict[str, Any]] = [] - meta = story_meta or {} - for s in stories: - md = (s.canonical_markdown or "").strip().replace("\n", " ") - preview = md[:preview_chars] + ("…" if len(md) > preview_chars else "") - links: list[str] = [] - for cl in getattr(s, "chapter_links", None) or []: - ch = getattr(cl, "chapter", None) - if ch is None: - continue - cat = getattr(ch, "category", None) or "" - tit = getattr(ch, "title", None) or "" - links.append(f"{tit}({cat})") - row: dict[str, Any] = { - "id": s.id, - "title": s.title, - "preview": preview, - "linked_chapters": links, - } - m = meta.get(str(s.id)) - if m: - row["char_count"] = int(m.get("char_count", 0)) - row["version_count"] = int(m.get("version_count", 0)) - rows.append(row) - return json.dumps(rows, ensure_ascii=False, indent=2) - - def _build_segments_json_for_plan( segments: list[tuple[str, str]], *, text_preview_chars: int = 4000 ) -> str: @@ -157,7 +125,7 @@ class StoryRouteAgent: new_story_title=None, reason="no_llm", ) - payload = _build_candidate_json(candidate_stories, story_meta=story_meta) + payload = build_route_candidate_json(candidate_stories, story_meta, settings) prompt = get_story_route_prompt( chapter_category=chapter_category, chapter_title=chapter_title, @@ -211,7 +179,7 @@ class StoryRouteAgent: """ if not llm or len(segments) < 2: return None - payload = _build_candidate_json(candidate_stories, story_meta=story_meta) + payload = build_route_candidate_json(candidate_stories, story_meta, settings) segments_json = _build_segments_json_for_plan(segments) prompt = get_story_batch_plan_prompt( chapter_category=chapter_category, diff --git a/api/app/agents/memoir/story_route_payload.py b/api/app/agents/memoir/story_route_payload.py new file mode 100644 index 0000000..125b76c --- /dev/null +++ b/api/app/agents/memoir/story_route_payload.py @@ -0,0 +1,230 @@ +""" +Story 路由:候选故事 JSON 载荷(summary 优先、预算裁剪、固定排序)。 + +供 StoryRouteAgent 与单测复用。 +""" + +from __future__ import annotations + +import json +from datetime import timezone +from typing import Any, TYPE_CHECKING + +if TYPE_CHECKING: + from app.core.config import Settings + +from app.features.story.models import Story + + +def _linked_chapters(s: Story) -> list[str]: + links: list[str] = [] + for cl in getattr(s, "chapter_links", None) or []: + ch = getattr(cl, "chapter", None) + if ch is None: + continue + cat = getattr(ch, "category", None) or "" + tit = getattr(ch, "title", None) or "" + links.append(f"{tit}({cat})") + return links + + +def _updated_at_iso(s: Story) -> str: + ua = getattr(s, "updated_at", None) + if ua is None: + return "" + if ua.tzinfo is None: + ua = ua.replace(tzinfo=timezone.utc) + return ua.isoformat() + + +def _has_usable_summary(s: Story, summary_min_len: int) -> bool: + t = (getattr(s, "summary", None) or "").strip() + return len(t) >= summary_min_len + + +def _truncate_body_for_route( + md: str, + *, + body_max_chars: int, + head_chars: int, + tail_chars: int, +) -> str: + """单篇正文进入路由 prompt 的裁剪:尽量全文,否则 head+tail。""" + m = (md or "").strip() + if not m: + return "" + if len(m) <= body_max_chars: + return m + hc = max(1, min(head_chars, body_max_chars // 2)) + tc = max(1, min(tail_chars, body_max_chars // 2)) + mid_omit = len(m) - hc - tc + if mid_omit <= 0: + return m[:body_max_chars] + return f"{m[:hc]}\n…(中间省略 {mid_omit} 字)…\n{m[-tc:]}" + + +def sort_stories_for_route( + stories: list[Story], + story_meta: dict[str, dict[str, int]], + *, + summary_min_chars: int, +) -> list[Story]: + """has_summary(desc) → updated_at(desc) → version_count(desc) → char_count(desc) → id(asc)""" + + def key(s: Story) -> tuple: + sid = str(s.id) + m = story_meta.get(sid) or {} + vc = int(m.get("version_count", 0)) + cc = int(m.get("char_count", 0)) + ua = getattr(s, "updated_at", None) + ts = 0.0 + if ua is not None: + if ua.tzinfo is None: + ua = ua.replace(tzinfo=timezone.utc) + ts = ua.timestamp() + return ( + not _has_usable_summary(s, summary_min_chars), + -ts, + -vc, + -cc, + sid, + ) + + return sorted(stories, key=key) + + +def _build_full_row( + s: Story, + story_meta: dict[str, dict[str, int]], + *, + summary_min_chars: int, + body_max_chars: int, + head_chars: int, + tail_chars: int, +) -> dict[str, Any]: + sid = str(s.id) + meta = story_meta.get(sid) or {} + canon = (s.canonical_markdown or "").strip() + char_count = int(meta.get("char_count", len(canon))) + version_count = int(meta.get("version_count", 0)) + row: dict[str, Any] = { + "id": s.id, + "title": s.title, + "char_count": char_count, + "version_count": version_count, + "updated_at": _updated_at_iso(s), + "linked_chapters": _linked_chapters(s), + } + if _has_usable_summary(s, summary_min_chars): + row["summary"] = (getattr(s, "summary", None) or "").strip() + return row + body = _truncate_body_for_route( + canon, + body_max_chars=body_max_chars, + head_chars=head_chars, + tail_chars=tail_chars, + ) + if body: + row["body_for_route"] = body + return row + + +def _build_index_row( + s: Story, + story_meta: dict[str, dict[str, int]], + *, + preview_chars: int, +) -> dict[str, Any]: + sid = str(s.id) + meta = story_meta.get(sid) or {} + canon = (s.canonical_markdown or "").strip().replace("\n", " ") + preview = canon[:preview_chars] + ("…" if len(canon) > preview_chars else "") + char_count = int(meta.get("char_count", len((s.canonical_markdown or "").strip()))) + return { + "id": s.id, + "title": s.title, + "char_count": char_count, + "preview": preview, + } + + +def _rows_json_len(rows: list[dict[str, Any]]) -> int: + return len(json.dumps(rows, ensure_ascii=False)) + + +def apply_total_budget_downgrade( + rows: list[dict[str, Any]], + *, + stories_by_id: dict[str, Story], + story_meta: dict[str, dict[str, int]], + total_max_chars: int, + index_preview_chars: int, +) -> list[dict[str, Any]]: + """从列表尾部(低优先级)起将整行降级为索引行,直到 JSON 总长不超过预算。""" + out = [dict(r) for r in rows] + + def _is_index_row(r: dict[str, Any]) -> bool: + return "preview" in r and "summary" not in r and "body_for_route" not in r + + while _rows_json_len(out) > total_max_chars: + replaced = False + for i in range(len(out) - 1, -1, -1): + sid = str(out[i].get("id", "")) + st = stories_by_id.get(sid) + if st is None or _is_index_row(out[i]): + continue + out[i] = _build_index_row( + st, + story_meta, + preview_chars=index_preview_chars, + ) + replaced = True + break + if not replaced: + break + return out + + +def build_route_candidate_rows( + stories: list[Story], + story_meta: dict[str, dict[str, int]] | None, + settings: "Settings", +) -> list[dict[str, Any]]: + """排序 + 完整候选行(尚未做总预算降级)。""" + meta = story_meta or {} + summary_min = int(settings.story_route_summary_min_chars) + ordered = sort_stories_for_route(stories, meta, summary_min_chars=summary_min) + body_max = int(settings.story_route_candidate_body_max_chars) + head_c = int(settings.story_route_long_body_head_chars) + tail_c = int(settings.story_route_long_body_tail_chars) + rows: list[dict[str, Any]] = [] + for s in ordered: + rows.append( + _build_full_row( + s, + meta, + summary_min_chars=summary_min, + body_max_chars=body_max, + head_chars=head_c, + tail_chars=tail_c, + ) + ) + by_id = {str(s.id): s for s in ordered} + total_max = int(settings.story_route_candidate_total_max_chars) + index_prev = int(settings.story_route_index_preview_chars) + return apply_total_budget_downgrade( + rows, + stories_by_id=by_id, + story_meta=meta, + total_max_chars=total_max, + index_preview_chars=index_prev, + ) + + +def build_route_candidate_json( + stories: list[Story], + story_meta: dict[str, dict[str, int]] | None, + settings: "Settings", +) -> str: + rows = build_route_candidate_rows(stories, story_meta, settings) + return json.dumps(rows, ensure_ascii=False, indent=2) diff --git a/api/app/core/config.py b/api/app/core/config.py index 0564a34..71dddc5 100644 --- a/api/app/core/config.py +++ b/api/app/core/config.py @@ -217,6 +217,15 @@ class Settings(BaseSettings): # Append 硬上限:canonical 字符数、版本数(超限强制 new_story) story_append_max_canonical_chars: int = Field(default=12000, ge=1000, le=500_000) story_append_max_versions: int = Field(default=20, ge=1, le=500) + # StoryRouteAgent:候选 JSON 预算(保守默认,可调大) + story_route_candidate_body_max_chars: int = Field(default=1600, ge=200, le=8000) + story_route_candidate_total_max_chars: int = Field( + default=16_000, ge=2000, le=100_000 + ) + story_route_long_body_head_chars: int = Field(default=700, ge=100, le=4000) + story_route_long_body_tail_chars: int = Field(default=700, ge=100, le=4000) + story_route_summary_min_chars: int = Field(default=30, ge=0, le=500) + story_route_index_preview_chars: int = Field(default=80, ge=20, le=500) # Evidence 检索 top_k:大批次 unit 时降低检索量 evidence_top_k_default: int = Field(default=10, ge=1, le=50) evidence_top_k_large_batch: int = Field(default=5, ge=1, le=50) diff --git a/api/tests/test_story_route_payload.py b/api/tests/test_story_route_payload.py new file mode 100644 index 0000000..02aa6ea --- /dev/null +++ b/api/tests/test_story_route_payload.py @@ -0,0 +1,146 @@ +"""Story 路由候选 JSON:排序、summary 优先、预算降级。""" + +from __future__ import annotations + +import json +from datetime import datetime, timezone +from types import SimpleNamespace + +from app.agents.memoir.story_route_payload import ( + build_route_candidate_json, + build_route_candidate_rows, + sort_stories_for_route, + _truncate_body_for_route, +) +from app.core.config import Settings + + +def _story(**kwargs): + defaults = dict( + id="s-default", + title="T", + summary=None, + canonical_markdown="", + updated_at=None, + chapter_links=[], + ) + defaults.update(kwargs) + return SimpleNamespace(**defaults) + + +def test_sort_has_summary_first_then_recency(): + older = _story( + id="old", + summary="x" * 40, + updated_at=datetime(2020, 1, 1, tzinfo=timezone.utc), + ) + newer = _story( + id="new", + summary="", + canonical_markdown="body", + updated_at=datetime(2025, 1, 1, tzinfo=timezone.utc), + ) + meta = { + "old": {"char_count": 10, "version_count": 1}, + "new": {"char_count": 20, "version_count": 2}, + } + out = sort_stories_for_route([newer, older], meta, summary_min_chars=30) + assert [s.id for s in out] == ["old", "new"] + + +def test_sort_tiebreak_version_then_char_then_id(): + t = datetime(2024, 6, 1, tzinfo=timezone.utc) + a = _story(id="a", summary="", canonical_markdown="a", updated_at=t) + b = _story(id="b", summary="", canonical_markdown="bb", updated_at=t) + meta = { + "a": {"char_count": 100, "version_count": 1}, + "b": {"char_count": 50, "version_count": 3}, + } + out = sort_stories_for_route([a, b], meta, summary_min_chars=30) + assert [s.id for s in out] == ["b", "a"] + + +def test_summary_sufficient_omits_body(): + s = _story( + id="1", + summary="信" * 40, + canonical_markdown="正文" * 500, + ) + settings = Settings() + rows = build_route_candidate_rows( + [s], {"1": {"char_count": 10, "version_count": 1}}, settings + ) + assert "summary" in rows[0] + assert "body_for_route" not in rows[0] + + +def test_short_summary_falls_back_to_body(): + s = _story( + id="1", + summary="短", + canonical_markdown="唯一的正文用于路由", + ) + settings = Settings() + rows = build_route_candidate_rows( + [s], {"1": {"char_count": 20, "version_count": 1}}, settings + ) + assert "summary" not in rows[0] + assert rows[0].get("body_for_route") + + +def test_long_body_uses_head_tail(): + md = "块" * 3000 + out = _truncate_body_for_route( + md, + body_max_chars=1600, + head_chars=100, + tail_chars=100, + ) + assert "中间省略" in out + assert len(out) < len(md) + + +def test_total_budget_downgrades_tail_rows(monkeypatch): + settings = Settings() + monkeypatch.setattr(settings, "story_route_candidate_total_max_chars", 800) + monkeypatch.setattr(settings, "story_route_index_preview_chars", 40) + stories = [ + _story( + id="1", + summary="", + canonical_markdown="A" * 400, + updated_at=datetime(2025, 1, 2, tzinfo=timezone.utc), + ), + _story( + id="2", + summary="", + canonical_markdown="B" * 400, + updated_at=datetime(2025, 1, 1, tzinfo=timezone.utc), + ), + ] + meta = { + "1": {"char_count": 400, "version_count": 1}, + "2": {"char_count": 400, "version_count": 1}, + } + payload = build_route_candidate_json(stories, meta, settings) + data = json.loads(payload) + assert any("preview" in row and "body_for_route" not in row for row in data) + + +def test_json_includes_core_fields(): + s = _story( + id="x1", + title="标题", + summary="y" * 40, + updated_at=datetime(2024, 1, 1, tzinfo=timezone.utc), + ) + settings = Settings() + js = build_route_candidate_json( + [s], {"x1": {"char_count": 5, "version_count": 2}}, settings + ) + row = json.loads(js)[0] + assert row["id"] == "x1" + assert row["title"] == "标题" + assert row["version_count"] == 2 + assert row["char_count"] == 5 + assert "updated_at" in row diff --git a/api/tests/test_story_route_prompts_and_behavior.py b/api/tests/test_story_route_prompts_and_behavior.py new file mode 100644 index 0000000..8ccadc5 --- /dev/null +++ b/api/tests/test_story_route_prompts_and_behavior.py @@ -0,0 +1,130 @@ +"""Story 路由:提示词片段与 mock LLM 行为回归。""" + +from __future__ import annotations + +import json +from datetime import datetime, timezone +from types import SimpleNamespace +from unittest.mock import MagicMock, patch + +from app.agents.memoir.prompts import ( + get_story_batch_plan_prompt, + get_story_route_prompt, + story_route_merge_hint_for_category, +) +from app.agents.memoir.story_route_agent import StoryRouteAgent + + +def test_route_prompt_beliefs_has_strong_container_and_no_uncertain_new(): + p = get_story_route_prompt( + chapter_category="beliefs", + chapter_title="信念与价值观", + batch_transcript="口述测试内容用于路由协议长度达标", + candidate_stories_json="[]", + ) + assert "强主题容器" in p + assert "不太确定" in p and "new_story" in p + assert "两层决策标准" in p + + +def test_batch_plan_prompt_aligns_with_route(): + p = get_story_batch_plan_prompt( + chapter_category="beliefs", + chapter_title="t", + segments_json="[]", + candidate_stories_json="[]", + ) + assert "强主题容器" in p + assert "仅因不确定" in p + + +def test_merge_hint_family_neutral_line(): + h = story_route_merge_hint_for_category("family") + assert "家庭" in h + assert "新事件链" in h + + +def test_merge_hint_career_episodic(): + h = story_route_merge_hint_for_category("career_achievement") + assert "经历叙事" in h + + +def test_decide_beliefs_mock_llm_append_and_prompt_has_payload(): + captured: dict[str, str] = {} + + def fake_invoke(_llm, prompt: str, **_kwargs): + captured["prompt"] = prompt + return json.dumps( + { + "decision": "append_story", + "target_story_id": "story-a", + "reason": "同一价值观补充", + }, + ensure_ascii=False, + ) + + cand = SimpleNamespace( + id="story-a", + title="信念", + summary="y" * 40, + canonical_markdown="x" * 200, + updated_at=datetime(2025, 1, 1, tzinfo=timezone.utc), + chapter_links=[], + ) + with patch( + "app.agents.memoir.story_route_agent.invoke_json_object", + side_effect=fake_invoke, + ): + agent = StoryRouteAgent() + d = agent.decide( + chapter_category="beliefs", + chapter_title="信念与价值观", + batch_transcript="我始终相信要谨慎行事。", + candidate_stories=[cand], + llm=MagicMock(), + valid_story_ids={"story-a"}, + story_meta={"story-a": {"char_count": 200, "version_count": 1}}, + ) + assert d.decision == "append_story" + assert d.target_story_id == "story-a" + assert "强主题容器" in captured["prompt"] + assert "story-a" in captured["prompt"] + + +def test_decide_career_mock_llm_new_story_and_prompt_episodic(): + captured: dict[str, str] = {} + + def fake_invoke(_llm, prompt: str, **_kwargs): + captured["prompt"] = prompt + return json.dumps( + { + "decision": "new_story", + "target_story_id": None, + "reason": "另一段完全不同的任职经历", + }, + ensure_ascii=False, + ) + + cand = SimpleNamespace( + id="s1", + title="早期", + summary="", + canonical_markdown="南京军区经历若干字", + updated_at=datetime(2024, 1, 1, tzinfo=timezone.utc), + chapter_links=[], + ) + with patch( + "app.agents.memoir.story_route_agent.invoke_json_object", + side_effect=fake_invoke, + ): + d = StoryRouteAgent().decide( + chapter_category="career_achievement", + chapter_title="主要成就", + batch_transcript="后来调动到北京机关,岗位与南京完全不同。", + candidate_stories=[cand], + llm=MagicMock(), + valid_story_ids={"s1"}, + story_meta={"s1": {"char_count": 20, "version_count": 1}}, + ) + assert d.decision == "new_story" + assert "经历叙事" in captured["prompt"]