feat(evaluation): memoir readiness, judge/replay updates, eval web playground

Add memoir_readiness_service and router tests; extend judge schemas/services, replay_service, and conversation rubric; align story route agent, payload, prompts, and story_pipeline_sync; update agent logging, config, and DI. Document internal-eval; add replayDraft util and PlaygroundPage changes in app-eval-web.
This commit is contained in:
Kevin
2026-04-08 09:38:07 +08:00
parent 99543d04c6
commit 6772e1269c
26 changed files with 1255 additions and 124 deletions

View File

@@ -5,7 +5,7 @@ Agent / LLM 诊断日志:耗时、输入输出规模、截断预览。
- **摘要**单行耗时、字符数、operation 名):当 ``LOG_AGENT_VERBOSE=1`` 时通过 ``logger.info`` 输出,
便于生产环境在不把全局日志调到 DEBUG 的情况下排查 Agent 性能与路径。
敏感内容DEBUG 下会记录用户相关文本截断预览,生产环境请勿长期开启 DEBUG。
敏感内容DEBUG 下会记录用户相关文本``AGENT_LOG_MAX_CHARS=0`` 时记录全文,生产环境请勿长期开启 DEBUG。
配置(节选):``AGENT_LOG_OMIT_SYSTEM_MESSAGE_BODY``(默认 true省略聊天 System 正文,仅打 len+sha12
``AGENT_LOG_JSON_PROMPT_PREFIX_CHARS`` + ``AGENT_LOG_JSON_PROMPT_PREFIX_ONLY_IF_LEN_GT`` 在 DEBUG 下跳过
@@ -35,12 +35,12 @@ def agent_summary_enabled() -> bool:
def truncate_for_log(text: str | None, *, max_chars: int | None = None) -> str:
"""截断过长文本,避免日志爆量。"""
"""截断过长文本,避免日志爆量。max_chars / AGENT_LOG_MAX_CHARS 为 0 表示不截断。"""
if text is None:
return ""
max_c = max_chars if max_chars is not None else settings.agent_log_max_chars
s = str(text)
if len(s) <= max_c:
if max_c <= 0 or len(s) <= max_c:
return s
return s[:max_c] + f"... [truncated total_len={len(s)}]"