feat(evaluation): memoir readiness, judge/replay updates, eval web playground

Add memoir_readiness_service and router tests; extend judge schemas/services, replay_service, and conversation rubric; align story route agent, payload, prompts, and story_pipeline_sync; update agent logging, config, and DI. Document internal-eval; add replayDraft util and PlaygroundPage changes in app-eval-web.
2026-04-08 09:38:07 +08:00
parent 99543d04c6
commit 6772e1269c
26 changed files with 1255 additions and 124 deletions
--- a/api/app/features/evaluation/rubrics/conversation_v1.py
+++ b/api/app/features/evaluation/rubrics/conversation_v1.py
@@ -67,7 +67,7 @@ major_strengths, major_issues, insufficient_evidence, evidence_refs, confidence,

 `confidence`：0 到 1 之间小数，表示你对本次评分整体可信度（证据充分则偏高）。

-`total_score` 必须等于上述 15 个细项之和（满分 100）。
+`total_score` 必须等于上述 15 个细项之和（满分 100）。**输出前将 15 项逐项相加验算**；勿在未顶格时默认写 100（例如情绪四项为 9+8+6+6、其余块均顶格时，合计为 99 而非 100）。
 聚合分 emotion_score、information_score、persona_score、structure_score、question_score 可不填（服务端会重算）。
 只输出 JSON。
 """