数据库 - 新增迁移 0003:timeline_events.memory_source_id 外键 → memory_sources,便于按 ingest 源做时间线幂等 后端 - 记忆 - 新增 ingest 后 LLM 富化(摘要/事实/时间线),可配置开关与最大字符数 - 新增证据包组装:合并 chunk、摘要、事实、时间线、故事等检索结果;支持空 query 时是否仍带 rolling 等开关 - repo/retriever/service/router/schemas/summarizer/timeline/extractor 等扩展;文档 memory-retrieval.md 更新 后端 - 对话 WS - 增加 PING/PONG;分段 ASR 日志与空音频处理;转写失败与「无助手回复」错误提示更明确 - 助手多段回复持久化使用统一分隔符,与分段逻辑一致 后端 - Agent - reply_limits:按 [SPLIT] 与段落拆段,并保证非空 fallback,供 WS 与 TTS 多段下发 后端 - 回忆录任务 - transcript ingest 记录 source_id;任务成功结?
68 lines
2.0 KiB
Python
68 lines
2.0 KiB
Python
"""访谈/资料追问:回复条数与单条字数硬限制(不靠长 prompt)。"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import re
|
||
|
||
|
||
def segments_from_llm_response(
|
||
response_text: str,
|
||
*,
|
||
max_segments: int = 3,
|
||
min_paragraph_chars: int = 12,
|
||
) -> list[str]:
|
||
"""
|
||
优先按字面 [SPLIT] 拆段;若模型只输出一段、但用空行写了多段,再按段落拆。
|
||
解决「两段话 + 换行」却未写 [SPLIT] 时仍要拆气泡 / 多段 TTS 的情况。
|
||
"""
|
||
text = (response_text or "").strip()
|
||
if not text:
|
||
return []
|
||
primary = [p.strip() for p in text.split("[SPLIT]") if p.strip()]
|
||
if len(primary) > 1:
|
||
return primary[:max_segments]
|
||
blob = primary[0] if primary else text
|
||
if "\n" not in blob:
|
||
return [blob]
|
||
paras = [p.strip() for p in re.split(r"\n\s*\n+", blob) if p.strip()]
|
||
if len(paras) < 2:
|
||
return [blob]
|
||
paras = [p for p in paras if len(p) >= min_paragraph_chars]
|
||
if len(paras) < 2:
|
||
return [blob]
|
||
return paras[:max_segments]
|
||
|
||
|
||
def nonempty_segments_or_fallback(
|
||
segments: list[str],
|
||
*,
|
||
fallback: str,
|
||
) -> list[str]:
|
||
"""去掉空段;若全部为空白/空串则返回单条 fallback,避免 WS 下发空 text。"""
|
||
cleaned = [s for s in segments if (s or "").strip()]
|
||
if cleaned:
|
||
return cleaned
|
||
fb = (fallback or "").strip()
|
||
return [fb] if fb else ["…"]
|
||
|
||
|
||
def truncate_chat_segments(
|
||
segments: list[str],
|
||
*,
|
||
max_segments: int,
|
||
max_chars_per_segment: int,
|
||
) -> list[str]:
|
||
"""保留前 max_segments 条,每条截断至 max_chars_per_segment(按字符数,中文友好)。"""
|
||
if not segments:
|
||
return []
|
||
out: list[str] = []
|
||
for raw in segments[:max_segments]:
|
||
s = (raw or "").strip()
|
||
if not s:
|
||
continue
|
||
if len(s) > max_chars_per_segment:
|
||
# 保留 1 个字符给省略号,使总长度不超过上限
|
||
s = s[: max_chars_per_segment - 1].rstrip() + "…"
|
||
out.append(s)
|
||
return out
|