Files
life-echo/api/app/features/memoir/oral_normalize.py
Sully 53e0065e3e refactor(api): TOML 配置 SSOT、统一错误契约、Auth/事务加固与可观测性 (#33)
配置 SSOT(TOML + .env)
统一错误契约
Auth 与事务边界
Redis / Celery 可靠性:业务 Redis(DB/0)与 Celery broker/backend(DB/1)显式拆分;连接池、sync client
可观测性(OpenTelemetry + LGTM)
2026-05-22 13:44:50 +08:00

51 lines
1.6 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
口述归一:在进入叙事与忠实度校验前,对同一段文本做可控预处理(规则 / 可选 LLM
不改变 segment 落库原文;仅作为 memoir story 生成路径的派生输入。
规则层与聊天侧共用 `apply_conversation_input_rules`(见 conversation.input_normalize
"""
from __future__ import annotations
from typing import Any
from app.core.config import settings
from app.core.text_normalize import apply_oral_rules, llm_normalize_text
from app.features.memoir.constants import memoir
def _llm_normalize_oral(text: str, llm: Any) -> str | None:
"""仅修正明显错字与同音字,不增事实;失败返回 None。"""
return llm_normalize_text(
text,
llm,
max_input_chars=int(memoir.oral_normalize_llm_max_input_chars),
max_tokens=int(memoir.oral_normalize_llm_max_tokens),
agent_name="oral_normalize.llm",
)
def normalize_oral_for_memoir(text: str, *, llm: Any | None = None) -> str:
"""
供 story pipeline 单一出口:叙事与忠实度使用同一返回值。
- off / 全局关闭:原文
- rules仅规则
- rules + LLM 分支先规则可选LLMLLM 失败则保留规则结果
"""
if not memoir.oral_normalize_enabled:
return text or ""
mode = (memoir.oral_normalize_mode or "rules").strip().lower()
if mode == "off":
return text or ""
base = apply_oral_rules(text or "")
if mode != "llm":
return base
refined = _llm_normalize_oral(base, llm)
if refined is not None:
return refined
return base