feat(eval): internal-eval stack, judge fixes, and eval web overhaul

- Merge internal-eval into development.sh (single Celery/infra); internal-eval.sh
  wraps with LIFE_ECHO_WITH_INTERNAL_EVAL; EVAL_ATTACH_ONLY for attaching 8001
  when :8000 is already up; document in api/docs/internal-eval.md.
- Evaluation: transcript_for_judge, judge error surfacing, rubric/schema tweaks,
  execution_service and router updates; tests for judge and composite eval.
- Memory: ingest nested transaction for embedding/enrichment rollback safety.
- Conversation WS: logger.exception for pipeline errors (avoid loguru KeyError).
- app-eval-web: Playground saved replays, dialogue turns helper, hash user_id
  for Memoir; Memoir chapter baseline↔DB row compare with title heuristics;
  Stories page (#memoir-stories); Markdown + copy buttons; toolbar/panel UI;
  react-markdown; development proxy and fixture updates.
This commit is contained in:
Kevin
2026-04-07 17:15:01 +08:00
parent a50b72e7b5
commit 99543d04c6
47 changed files with 4968 additions and 1279 deletions

View File

@@ -185,6 +185,10 @@ class ManualJudgeConversationBody(BaseModel):
class ManualJudgeConversationStreamBody(BaseModel):
conversation_id: str
fixture_filename: str | None = None
include_turn_judges: bool = False
"""对当前会话逐轮调用评审 LLM在整体分之后"""
include_baseline_turn_judges: bool = False
"""对导出基线逐轮调用评审 LLM需 fixture + 整体基线分成功)。"""
class ManualJudgeConversationOut(BaseModel):