Kevin
3121d1384d
WIP: memory system improvements (in progress)
...
Interview/chat prompt layers, reply planner, style profiles, memory
injection, interview meta store, and related tests. Work not finished.
Made-with: Cursor
2026-04-22 16:56:28 +08:00
Kevin
309a051038
feat: 回忆录证据血缘与内部评测可追溯,顺带对齐本地评测台与 CI
...
数据库与模型:新增多版迁移(章节证据快照、对话血缘、记忆事实/时间线 lineage 等),把「成稿 ↔ 对话/记忆」的溯源信息落到表结构里。
业务链路:会话与 WS、回忆录/故事流水线、记忆写入与 enrichment 等跟着接上线索与快照;新增章节证据快照与评测侧 EvalTraceService 等模块,方便组评审用的证据包。
内部评测:自动化 run 与手工 memoir 评审共用可追溯证据;rubric/ judge 相关脚本与文档有配套调整。
app-eval-web:Memoir/实验详情里能展开看证据摘要与 evidence_trace(含对话轮次 id);Vite 代理与 development.sh 注入的 API 端口与当前默认内部评测端口一致,避免改端口后页面连错服务。
工程杂项:GitHub Actions / 仓库说明有更新;各适配器与支付/配额/plan 等多处为小改动或跟随主改动的收尾;新增/扩充了?
2026-04-08 15:37:09 +08:00
Kevin
99543d04c6
feat(eval): internal-eval stack, judge fixes, and eval web overhaul
...
- Merge internal-eval into development.sh (single Celery/infra); internal-eval.sh
wraps with LIFE_ECHO_WITH_INTERNAL_EVAL; EVAL_ATTACH_ONLY for attaching 8001
when :8000 is already up; document in api/docs/internal-eval.md.
- Evaluation: transcript_for_judge, judge error surfacing, rubric/schema tweaks,
execution_service and router updates; tests for judge and composite eval.
- Memory: ingest nested transaction for embedding/enrichment rollback safety.
- Conversation WS: logger.exception for pipeline errors (avoid loguru KeyError).
- app-eval-web: Playground saved replays, dialogue turns helper, hash user_id
for Memoir; Memoir chapter baseline↔DB row compare with title heuristics;
Stories page (#memoir-stories); Markdown + copy buttons; toolbar/panel UI;
react-markdown; development proxy and fixture updates.
2026-04-07 17:18:47 +08:00
Kevin
5972b0e721
feat(evaluation): 成稿 100 分 rubric、证据评审与评测台调整
...
- 回忆录细项上限收紧为合计 100 分,去掉 110 折算与 raw_dimension_total
- judge_memoir 拼接原始访谈与可选导出基线;无证据时提示保守打真实性相关分
- 自动评测 run 与手动章节/故事评审统一带 transcript 证据(会话/用户聚合、截断)
- 访谈打分仍为情绪强化版 15 细项、总分 100
- 评测台默认基准改为 zuckxu 导出 MD;移除逐轮用户句对齐表及相关逻辑
- 新增 judge schema 与 memoir prompt 组装的单元测试
2026-04-07 10:36:22 +08:00
Kevin
b75edacb5f
feat/ 导出开发容器内的数据用于评估
2026-04-03 14:44:46 +08:00