Files
life-echo/api/app/features/memoir/oral_normalize.py
Kevin 53d9e003af feat(api): 叙事 prompt、职业上下文、读路径章节、WS 解耦与错误脱敏
- 回忆录:事实边界补充允许清单;传记文体示例与 JSON 叙事要求对齐
- default 职业提示 occupation_context;cadre/military 退休语境
- GET 章节读路径零写入,prepare_chapter_read_view + markdown_for_response
- 文本归一抽到 core/text_normalize;移除弃用 reply 策略与 recompose_chapters_for_story
- ConversationService:WS 连接/用户段落/结束对话;对外错误固定文案
- 测试:HTTP 脱敏契约、章节读视图、occupation 与 background_voice
2026-04-01 11:55:52 +08:00

50 lines
1.6 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
口述归一:在进入叙事与忠实度校验前,对同一段文本做可控预处理(规则 / 可选 LLM
不改变 segment 落库原文;仅作为 memoir story 生成路径的派生输入。
规则层与聊天侧共用 `apply_conversation_input_rules`(见 conversation.input_normalize
"""
from __future__ import annotations
from typing import Any
from app.core.config import settings
from app.core.text_normalize import apply_oral_rules, llm_normalize_text
def _llm_normalize_oral(text: str, llm: Any) -> str | None:
"""仅修正明显错字与同音字,不增事实;失败返回 None。"""
return llm_normalize_text(
text,
llm,
max_input_chars=int(settings.memoir_oral_normalize_llm_max_input_chars),
max_tokens=int(settings.memoir_oral_normalize_llm_max_tokens),
agent_name="oral_normalize.llm",
)
def normalize_oral_for_memoir(text: str, *, llm: Any | None = None) -> str:
"""
供 story pipeline 单一出口:叙事与忠实度使用同一返回值。
- off / 全局关闭:原文
- rules仅规则
- rules + LLM 分支先规则可选LLMLLM 失败则保留规则结果
"""
if not settings.memoir_oral_normalize_enabled:
return text or ""
mode = (settings.memoir_oral_normalize_mode or "rules").strip().lower()
if mode == "off":
return text or ""
base = apply_oral_rules(text or "")
if mode != "llm":
return base
refined = _llm_normalize_oral(base, llm)
if refined is not None:
return refined
return base