feat(evaluation): memoir readiness, judge/replay updates, eval web playground
Add memoir_readiness_service and router tests; extend judge schemas/services, replay_service, and conversation rubric; align story route agent, payload, prompts, and story_pipeline_sync; update agent logging, config, and DI. Document internal-eval; add replayDraft util and PlaygroundPage changes in app-eval-web.
This commit is contained in:
@@ -18,7 +18,7 @@ cd api && uv run pytest tests/test_judge_schemas.py tests/test_eval_composite.py
|
||||
变更 rubric 后建议:
|
||||
|
||||
1. 跑通上述 pytest。
|
||||
2. 任选 1~2 条 fixture,用内网评测或 `EvalJudgeManualService` 对真实 GLM 跑一次人工 spot-check,对照 `expected_band` / `must_flag_issues` 是否仍合理。
|
||||
2. 任选 1~2 条 fixture,用内网评测或 `EvalJudgeManualService` 对 GLM-5 跑一次人工 spot-check,对照 `expected_band` / `must_flag_issues` 是否仍合理。
|
||||
|
||||
## rubric 版本
|
||||
|
||||
|
||||
Reference in New Issue
Block a user