feat(i18n): persist language preference and thread through chat, memoir, TTS

- Add users.language_preference (Alembic 0018, default zh); capture at signup/SMS only; expose on auth and profile APIs - Lite English prompts for chat and memoir; localized stage labels and agent names (Life Echo / 岁月知己) - Tencent TTS: language-aware synthesis, ModelType=1 for 501004, English chunking - WebSocket pipeline: emit all AGENT_RESPONSE segments when TTS cancels; INFO logs for tts_this_turn and TTS decisions; on-demand TTS logging - Expo: device language on auth, i18n tiers/agent name, [SPLIT] streaming UX fixes - Tests for migration, prompts, pipeline, router tts_this_turn, reply segments Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 16:16:49 +08:00
parent 5ce29aad64
commit ccdc4e4277
64 changed files with 3233 additions and 208 deletions
--- a/api/app/agents/memoir/prompts.py
+++ b/api/app/agents/memoir/prompts.py
@@ -13,6 +13,30 @@ from app.agents.stage_constants import STAGE_ERA_HINTS, STAGE_SLOT_KEYS
 from app.agents.style_profiles import MemoirStyleProfile


+def _memoir_fidelity_core_rules_en() -> str:
+    """English-lite version of the fact-boundary rules 1–4."""
+    return """## Fact boundary (must follow; takes precedence over style)
+1. **The body may only expand on the content in the "User's oral memory this turn" block.** If the input includes a "Reference memory snippets" block, you must not write its specifics as the user's first-hand experience this turn; at most use one short bridging sentence, and never introduce names, places, dates, dialogue, or numbers that appear only in the snippets.
+2. **No fabrication.** Do not add people, dialogue, places, dates, events, causes, or numbers the user did not state. Do not invent inner monologue or "typical era" filler. If the user did not state an outcome (selected, accepted, rejected, etc.), do not write a definite conclusion. Prefer neutral, partial wording when uncertain.
+3. **Do not pad for length.** Short input → short output. Paragraph count and length follow the material.
+4. Allowed: removing fillers and small talk, reordering for clarity, merging redundant references, lifting spoken language to written prose. Do not invent details to "make the writing nicer."
+
+## Encouraged operations (not fabrication)
+- Lift colloquial speech to clean written English: trim filler, smooth syntax, choose more precise verbs.
+- Add short bridging sentences ("Looking back," "In those days") as long as they introduce no new entities.
+- Render emotions already stated in the oral memory in slightly more literary phrasing (the user said "it was hard," you may write "it weighed on me") — provided you add no new scenes, numbers, or actions.
+- Merge synonymous repeated statements for tighter narration.
+- Correct obvious speech-to-text typos.
+- **Era / cultural texture (only with anchored facts)**: when the oral memory or profile fields make the year, region, or environment clear, you may use period-appropriate vocabulary and ambient texture as a touch — but you may not invent specific people, events, dialogue, or scenes."""
+
+
+def _memoir_fidelity_user_profile_rules_en() -> str:
+    return """## User profile and stage information
+- The "About the user" / "Time reference" blocks may only be used for items that are explicitly listed.
+- **Cultural / era texture (encouraged when anchored)**: when this turn's oral memory clearly belongs to the same era or place that profile facts describe, you may weave the era and place into the prose as **language and atmosphere** (forms of address, regional expressions, period feel). You still may not turn profile facts alone into a specific event the user did not narrate this turn.
+- Do not put concrete biographical details from the profile into the body unless the user actually mentioned them this turn."""
+
+
 def _memoir_fidelity_core_rules() -> str:
    """事实边界 1–4 条（与文体第 5 条拆分，供 story 叙事与标题等复用）。"""
    return """## 事实边界（必须遵守，优先于文采）
@@ -37,8 +61,15 @@ def _memoir_fidelity_user_profile_rules() -> str:
 - 档案中的具体经历细节不得写入正文，除非用户在本段口述里已提及或明确关联。"""


-def get_memoir_fidelity_system_prompt() -> str:
+def get_memoir_fidelity_system_prompt(language: str = "zh") -> str:
    """叙事/标题生成专用：准确性优先，禁止编造事实。"""
+    if language == "en":
+        return f"""You are a memoir editor. Your task is to lift the user's oral memory into first-person written prose.
+
+{_memoir_fidelity_core_rules_en()}
+5. **Plain narrative tone.** Keep description and metaphor restrained; clear chronicle, not lyrical essay.
+
+{_memoir_fidelity_user_profile_rules_en()}"""
    return f"""你是回忆录编辑助手，任务是把用户口述整理为第一人称书面叙述。

 {_memoir_fidelity_core_rules()}
@@ -47,8 +78,15 @@ def get_memoir_fidelity_system_prompt() -> str:
 {_memoir_fidelity_user_profile_rules()}"""


-def get_memoir_fidelity_facts_only_prompt() -> str:
+def get_memoir_fidelity_facts_only_prompt(language: str = "zh") -> str:
    """与 `get_memoir_fidelity_system_prompt` 相同的事实 1–4 条，第 5 条改为允许传记作家式文采（仍禁止编造）。"""
+    if language == "en":
+        return f"""You are a memoir editor. Your task is to lift the user's oral memory into first-person written prose.
+
+{_memoir_fidelity_core_rules_en()}
+5. **Style**: while obeying rules 1–4, write in a **first-person, lightly literary memoir voice** (scenes and emotion follow the material, never list-like reporting). Polish the speech into **graceful, flowing, readable** prose; where the oral memory or profile already anchors an era or region, you may let period vocabulary and atmosphere season the writing. You may organize the structure (paragraph splits within a single oral block, transitions, callbacks to people/things already named) **without introducing new facts**. Style serves truth; never use invented imagery to fill in missing facts.
+
+{_memoir_fidelity_user_profile_rules_en()}"""
    return f"""你是回忆录编辑助手，任务是把用户口述整理为第一人称书面叙述。

 {_memoir_fidelity_core_rules()}
@@ -57,20 +95,25 @@ def get_memoir_fidelity_facts_only_prompt() -> str:
 {_memoir_fidelity_user_profile_rules()}"""


-def _memoir_editor_narrative_style_block() -> str:
+def _memoir_editor_narrative_style_block(language: str = "zh") -> str:
    """传记作家改写要点：委托到独立的 `MemoirStyleProfile`，与 chat 风格隔离。"""
-    return MemoirStyleProfile().render_narrative_style_block()
+    return MemoirStyleProfile().render_narrative_style_block(language=language)


 def get_narrative_editor_system_prompt(
-    background_voice: str = "default", occupation: str = ""
+    background_voice: str = "default",
+    occupation: str = "",
+    language: str = "zh",
 ) -> str:
    """故事/章节叙事：传记作家式书面语 + 事实边界（chapter 直接展示 story 时使用）。"""
+    base = f"""{get_memoir_fidelity_facts_only_prompt(language=language)}
+
+{_memoir_editor_narrative_style_block(language=language)}"""
+    if language == "en":
+        # Skip occupation/background-voice Chinese-only addendums for English path.
+        return base
    occ_hint = get_occupation_narrative_hint(occupation, background_voice)
    tail = get_background_voice_narrative_block(background_voice)
-    base = f"""{get_memoir_fidelity_facts_only_prompt()}
-
-{_memoir_editor_narrative_style_block()}"""
    if occ_hint:
        base = f"{base}\n\n{occ_hint}"
    if not tail:
@@ -78,14 +121,34 @@ def get_narrative_editor_system_prompt(
    return f"{base}\n\n{tail}"


-def _short_classification_edit_prefix() -> str:
+def _short_classification_edit_prefix(language: str = "zh") -> str:
    """章节分类专用短系统前缀。"""
+    if language == "en":
+        return """You are a memoir editor. Ignore filler and small talk; classify only by **substantive life-experience content**.
+Keep: events, relationships, places and times, emotions and beliefs. Filter out: pure greetings, AI-interaction, unrelated chit-chat."""
    return """你是回忆录编辑。先忽略语气词与寒暄，只根据**与人生经历有关的实质内容**判断归类。
 保留：事件、人物关系、地点时间、情感与信念。过滤：纯寒暄、与 AI 的交互、无关闲聊。"""


-def get_chapter_classification_json_prompt(segments_text: str) -> str:
+def get_chapter_classification_json_prompt(
+    segments_text: str, language: str = "zh"
+) -> str:
    """章节分类：JSON 输出（与 invoke_json_object 配合）。"""
+    if language == "en":
+        return f"""{_short_classification_edit_prefix("en")}
+
+## Chapter keys
+childhood, education, career_early, career_achievement, career_challenge, family, beliefs, summary; if not enough to form a story → **none**.
+
+If, after stripping greetings, only profile-style point facts remain with no narrative spine (no event / scene / process / interaction / emotion arc) → **none**; a short but vivid micro-story belongs in the closest category.
+
+Dialogue content:
+{segments_text}
+
+Output shape (only this object):
+{{"category": "childhood|education|career_early|career_achievement|career_challenge|family|beliefs|summary|none"}}
+
+If you return **none**, the server will map this batch to the **summary** chapter and still write it into the memoir body (it is not dropped)."""
    return f"""{_short_classification_edit_prefix()}

 ## 章节 key（英文）
@@ -103,12 +166,48 @@ childhood, education, career_early, career_achievement, career_challenge, family


 def get_state_extraction_prompt(
-    user_message: str, current_stage: str, stage_slots: dict
+    user_message: str,
+    current_stage: str,
+    stage_slots: dict,
+    language: str = "zh",
 ) -> str:
    """抽取结构化信息并判断阶段"""
    slot_keys = list(stage_slots.keys())
    all_stage_slots = {k: list(v) for k, v in STAGE_SLOT_KEYS.items()}

+    if language == "en":
+        return f"""You are a memoir interview information extractor. From the user's utterance, extract structured information and decide which life stage they are actually talking about.
+Only extract snippets that are clearly supported by the oral memory; do not fabricate or guess.
+
+You should first distill the **substantive life-experience content** from the user's words, then extract structured slots (only when there is clear evidence in the oral memory).
+
+System currently tracking stage: {current_stage}
+Allowed slots for this stage: {slot_keys}
+
+All stages and their slots:
+{json.dumps(all_stage_slots, ensure_ascii=False, indent=2)}
+
+User utterance:
+{user_message}
+
+Return JSON only, in this shape:
+{{
+  "detected_stage": "childhood|education|career|family|belief",
+  "slots": {{
+    "slot_key": "snippet"
+  }},
+  "emotion": "neutral|warm|low|highlight",
+  "is_new_chapter": true
+}}
+
+Requirements:
+1. **First strip filler, AI-interaction commands, greetings, and small talk** — focus only on real life-experience content.
+2. **Only when slots is non-empty**, detected_stage must reflect what the user actually talked about; the user may discuss a different stage than the system is tracking.
+3. The keys in `slots` must belong to the slot list of `detected_stage`.
+4. Only fill slots with substantive, life-experience content the user actually mentioned.
+5. **Snippets are distilled cores** — strip filler, keep within ~50 characters where possible.
+6. If the utterance has no real life-experience content (pure small talk, meta-instructions like "organize my memories", commands, fillers), `slots` must be the empty object and `detected_stage` must equal the system's current stage."""
+
    return f"""你是回忆录访谈信息抽取助手。从用户话语中提取结构化信息，判断用户实际在谈论哪个人生阶段。
 只提取口述中确有依据的片段，不得编造或推测。

@@ -148,11 +247,51 @@ def get_batch_memoir_phase1_prep_prompt(
    system_current_stage: str,
    slots_snapshot: dict,
    segment_items: list[tuple[str, str]],
+    language: str = "zh",
 ) -> str:
    """
    Phase1 批处理：多段口述一次 JSON 输出「抽取 + 章节分类」。
    segment_items: (segment_id, user_text)，须按时间顺序。
    """
+    if language == "en":
+        lines_en: list[str] = []
+        for sid, text in segment_items:
+            lines_en.append(f"- id={sid}\n  text: {text}")
+        slot_lines_en = "\n".join(
+            f"- {st}: {', '.join(keys)}" for st, keys in STAGE_SLOT_KEYS.items()
+        )
+        return f"""You are a memoir interview assistant. Below are several user oral memory segments (in time order). For **each segment**:
+1) Extract information (slots, detected_stage) — same rules as single-segment extraction.
+2) Classify the chapter (chapter_category) — same rules as single-segment classification.
+
+System currently tracking stage (chat stage key): {system_current_stage}
+Slot summary already gathered (context only — do not invent details that did not appear):
+{json.dumps(slots_snapshot, ensure_ascii=False, indent=2)}
+
+`detected_stage` allowed values: childhood | education | career | family | belief
+The keys in `slots` must belong to the slot list for that stage:
+{slot_lines_en}
+
+`chapter_category` allowed values: childhood | education | career_early | career_achievement | career_challenge | family | beliefs | summary | **none**
+(Profile-only points or pure small talk → **none**, same as single-segment classification.)
+
+Per-segment task (the `segments` array MUST cover every id below in the same order):
+{chr(10).join(lines_en)}
+
+Return JSON object only (no markdown), shaped:
+{{
+  "segments": [
+    {{
+      "id": "<same as input id>",
+      "detected_stage": "childhood|education|career|family|belief",
+      "slots": {{ "slot_key": "snippet within ~50 chars" }},
+      "chapter_category": "childhood|education|career_early|career_achievement|career_challenge|family|beliefs|summary|none"
+    }}
+  ]
+}}
+
+Same as single-segment extraction: **only when `slots` is non-empty** does `detected_stage` follow the content; if no life-experience content exists this segment, `slots` must be empty and `detected_stage` must equal the current system stage `{system_current_stage}`."""
+
    lines: list[str] = []
    for sid, text in segment_items:
        lines.append(f"- id={sid}\n  文本：{text}")
@@ -213,9 +352,35 @@ def get_creative_title_prompt(
    slots: dict,
    user_profile: str = "",
    birth_year: Optional[int] = None,
+    language: str = "zh",
 ) -> str:
    """生成故事标题：概括口述事实或主题，禁止纯意象编造。"""
    age_hint = _build_age_hint(stage, birth_year)
+    if language == "en":
+        profile_section_en = (
+            f"\nAbout the user:\n{user_profile}" if user_profile else ""
+        )
+        time_section_en = f"\nTime reference: {age_hint}" if age_hint else ""
+        return f"""{get_memoir_fidelity_facts_only_prompt(language="en")}
+
+Generate **one** memoir story title based on the stage, emotion, and available information below.
+
+Stage: {stage}
+Emotion: {emotion}
+Available information (oral slots and profile): {slots}{profile_section_en}{time_section_en}
+
+Requirements:
+1. Format: "Time tag · Title body" (the time tag may use age, era, or stage; it must be consistent with the information above; do not invent years).
+2. The title body should be **6–12 words**, concisely summarizing a theme or fact present in the oral memory or slots; literary phrasing is welcome but **invention is forbidden**.
+3. Any **specific facts in the title** (job titles, unit names, battles, names, life-or-death outcomes) must have **literal evidence** in the oral excerpt or other slots; do not extrapolate from the stage name or age hint.
+4. Be concise; memoir-flavored; neither flat nor florid.
+
+### Examples (facts come from slots/oral memory; the format is illustrative)
+- Slots include childhood, river, heavy rain → `Around age 6 · Grandfather carrying me across the river in the rain`
+- Slots include dorm, instant noodles, cafeteria → `Student years · Instant noodles when the cafeteria did not suit me`
+
+Output only the title line — no quotes, no brackets.
+"""
    profile_section = f"\n用户基本信息：\n{user_profile}" if user_profile else ""
    time_section = f"\n时间参考：{age_hint}" if age_hint else ""

@@ -247,6 +412,7 @@ def get_creative_title_json_prompt(
    slots: dict,
    user_profile: str = "",
    birth_year: Optional[int] = None,
+    language: str = "zh",
 ) -> str:
    """生成故事标题（JSON：`{"title":"..."}`），与 invoke_json_object 配合。"""
    base = get_creative_title_prompt(
@@ -255,7 +421,14 @@ def get_creative_title_json_prompt(
        slots=slots,
        user_profile=user_profile,
        birth_year=birth_year,
+        language=language,
    )
+    if language == "en":
+        return (
+            base.rstrip()
+            + "\n\nExample output (only this JSON object):"
+            + '\n{"title":"Full title on one line (with time tag · body format)"}\n'
+        )
    return (
        base.rstrip()
        + "\n\n输出示例（仅此 JSON 对象）："
@@ -272,6 +445,7 @@ def get_narrative_json_prompt(
    birth_year: Optional[int] = None,
    background_voice: str = "default",
    occupation: str = "",
+    language: str = "zh",
 ) -> str:
    """将新对话改写为叙述，输出 JSON 格式（paragraphs: [{content, image_description}]）"""
    context_tail = ""
@@ -279,13 +453,52 @@ def get_narrative_json_prompt(
        context_tail = (
            existing_content[-300:] if len(existing_content) > 300 else existing_content
        )
+    age_hint = _build_age_hint(stage, birth_year)
+    if language == "en":
+        context_section_en = (
+            f"\n\n[Bridging context — tail of the existing story, for continuity only; do not repeat]:\n{context_tail}"
+            if context_tail
+            else ""
+        )
+        profile_section_en = (
+            f"\n\nAbout the user:\n{user_profile}" if user_profile else ""
+        )
+        time_section_en = f"\nTime reference: {age_hint}" if age_hint else ""
+        return f"""{get_narrative_editor_system_prompt(background_voice=background_voice, occupation=occupation, language="en")}
+
+Rewrite the "User's oral memory this turn" block into first-person written prose and return **pure JSON** (no markdown fences).
+
+Stage: {stage}
+Available information (slots): {slots}{profile_section_en}{time_section_en}
+
+Input material:
+{new_content}
+{context_section_en}
+
+## Requirements
+1. **Format**: JSON only; first person; no `#`, `##`, no tables; `content` is body text only.
+2. **Facts and material**: obey the fact boundary; do not fill in details that were not given. Expand only the "User's oral memory this turn"; if a reference-snippet block is included, do not write its specifics as the user's first-hand experience this turn; strip filler and small talk; do not repeat the full body of an existing story; stay within the same theme/event chain; paragraph count and length follow the material; do not pad for length.
+3. **Do not infer outcomes**: when the user did not state a result (admitted, accepted, etc.), do not fill in a definite conclusion based on common sense.
+
+## Output schema (strict JSON)
+{{
+  "paragraphs": [
+    {{"content": "paragraph body"}},
+    ...
+  ]
+}}
+
+- content: body text only.
+
+If nothing is worth recording: {{"paragraphs": []}}
+"""
+
    context_section = (
        f"\n\n【衔接上下文（已有内容的末尾，仅供参考衔接，不要重复）】：\n{context_tail}"
        if context_tail
        else ""
    )
    profile_section = f"\n\n用户基本信息：\n{user_profile}" if user_profile else ""
-    age_hint = _build_age_hint(stage, birth_year)
    time_section = f"\n时间参考：{age_hint}" if age_hint else ""

    return f"""{get_narrative_editor_system_prompt(background_voice=background_voice, occupation=occupation)}
@@ -348,19 +561,59 @@ def get_narrative_merge_json_prompt(
    birth_year: Optional[int] = None,
    background_voice: str = "default",
    occupation: str = "",
+    language: str = "zh",
 ) -> str:
    """
    已有故事追加：将「已有全文（或节选）」与「本段口述」合并为**一篇**第一人称叙述，
    按事件发生顺序组织段落，输出覆盖全篇的 JSON paragraphs。
    """
    clipped = clip_existing_story_body_for_merge(existing_content)
+    age_hint = _build_age_hint(stage, birth_year)
+
+    if language == "en":
+        existing_section_en = (
+            f"\n\n[Existing story body — keep all of its facts; reorder and bridge only; do not fabricate]:\n{clipped}"
+            if clipped
+            else ""
+        )
+        profile_section_en = (
+            f"\n\nAbout the user:\n{user_profile}" if user_profile else ""
+        )
+        time_section_en = f"\nTime reference: {age_hint}" if age_hint else ""
+        return f"""{get_narrative_editor_system_prompt(background_voice=background_voice, occupation=occupation, language="en")}
+
+You are **expanding and reorganizing** an existing memoir story: you must keep every fact from the existing story in the output (you may merge redundant phrasing and adjust order), and weave in the new facts from "User's oral memory this turn"; order paragraphs by **chronological order of events** (earliest → latest); do not drop existing content unless the new memory contradicts it.
+
+Stage: {stage}
+Available information (slots): {slots}{profile_section_en}{time_section_en}
+
+[User's oral memory this turn and reference — when an evidence-snippet block is present, follow the fact boundary]:
+{new_content}
+{existing_section_en}
+
+## Requirements
+1. **Full body output**: `paragraphs` must be the **complete reorganized story body** (not just this turn's segment).
+2. **Fact boundary**: obey the fact boundary; do not fill in missing details. Do not add people, places, dates, dialogue, or numbers that appear in neither the existing body nor this turn; write first-person, graceful prose; no `#`, `##`, no tables.
+3. If this turn fully overlaps the old body or adds no new information, return a faithful reorganized version of the old body (do not arbitrarily shorten it).
+4. **Do not infer outcomes**: when this turn does not state an outcome, do not assert a definite outcome unless the old body already states the same fact.
+
+## Output schema (strict JSON)
+{{
+  "paragraphs": [
+    {{"content": "paragraph body"}},
+    ...
+  ]
+}}
+
+If nothing can be retained: {{"paragraphs": []}}
+"""
+
    existing_section = (
        f"\n\n【已有故事正文（须全部保留事实，仅调整顺序与衔接；不得编造）】：\n{clipped}"
        if clipped
        else ""
    )
    profile_section = f"\n\n用户基本信息：\n{user_profile}" if user_profile else ""
-    age_hint = _build_age_hint(stage, birth_year)
    time_section = f"\n时间参考：{age_hint}" if age_hint else ""

    return f"""{get_narrative_editor_system_prompt(background_voice=background_voice, occupation=occupation)}
@@ -546,13 +799,24 @@ def get_story_batch_plan_prompt(
 """


-def format_narrative_user_content(oral_text: str, evidence_text: str = "") -> str:
+def format_narrative_user_content(
+    oral_text: str, evidence_text: str = "", language: str = "zh"
+) -> str:
    """
    将口述与检索摘录分区，供叙事模型区分「亲历」与参考材料。
    evidence 为空时仅输出口述块。
    """
    oral = (oral_text or "").strip()
    ev = (evidence_text or "").strip()
+    if language == "en":
+        if not ev:
+            return f"[User's oral memory this turn]\n{oral}"
+        return (
+            "[User's oral memory this turn]\n"
+            f"{oral}\n\n"
+            "[Reference memory snippets (not this turn's oral memory; do NOT write their specifics as the user's first-hand experience this turn — bridging only)]\n"
+            f"{ev}"
+        )
    if not ev:
        return f"【本段用户口述】\n{oral}"
    return (