Kevin
71fbd39e32
feat(api)!: memory single chain — async MemoryService, strict eval closure
...
Route all memory ingest/retrieve/enrichment/compaction through async MemoryService.
Remove legacy sync memory implementations (ingest/retrieve/compaction); Celery and
memoir Phase2 call asyncio.run into MemoryService-backed helpers.
Memoir Phase1 batch ingest uses MemoryService.ingest_transcripts_batch; drop chapters.
evidence_bundle_json mirror (Alembic 0015). Evaluation uses snapshot/link-only bundles;
raise EvidenceClosureMissing instead of partial/fallback lineage tiers.
Split memoir state into NarrativeCoverageState and InterviewControlState; delete the
_interview_meta_store adapter layer. Remove rolling-query and recent-fact fallback
settings from config and evidence assembly.
Update judges, docs, tests, and PlaygroundPage alignment.
Made-with: Cursor
2026-04-30 14:11:50 +08:00
Kevin
80833f7033
feat(api): DeepSeek V4 Flash 默认、HTTP 错讯与多供应商分层
...
- 主链路默认 deepseek-v4-flash,DEEPSEEK_THINKING_ENABLED 对齐旧非思考 chat
- 评测台评审装配迁入 adapters/llm(deepseek_eval_judge、zhipu_eval_judge)与 eval_judge_spec
- 拆分 llm_http_openai_chat_errors 与 llm_errors(DeepSeek/智谱品牌与文档链),llm_call 支持 http_error_vendor
- EvalJudgeService 按 spec.provider 传入 allm_json_call;评测台前端文案改为 V4 Flash
- 更新 .env 示例与 staging/production 的 DEEPSEEK_MODEL;补充 openai/供应商错讯测试
Made-with: Cursor
2026-04-27 14:34:30 +08:00
Kevin
e848f26354
feat/ internal eval平台现支持实机联调。 1. 显示当前本地数据库里登录用户的历史聊天,已生成的回忆录。支持在网页直接对话,不依赖手机app。
2026-04-20 11:58:32 +08:00
Kevin
ac49bc7f23
feat(eval): memoir A/B chapter judging and eval-web parity with dialogue
...
- Judge baseline excerpt and library chapter separately; build_memoir_compare_summary for gate, nine-dim and leaf deltas.
- Memoir SSE chapter payload: baseline_judge, compare_summary, baseline_judge_error.
- MemoirJudgeOutput: loose score coercion and post-validate clamp; memoir judge prompt caps from settings.
- app-eval-web: two-column MemoirScoreCard layout, MemoirCompareSummary, chapter blocks and CSS.
- Add memoir_compare_summary, log_events, celery_log_context, memoir_pipeline_progress; tests and migration 0014.
- Misc: memory/evidence and enrichment paths, task/orchestrator updates, internal-eval docs, env examples.
2026-04-10 10:25:15 +08:00
Kevin
b0251e5b26
feat(eval): server-side replay/phase1 timing + memoir phase1 batch chunking
...
- Replay and memoir-submit responses include started/finished UTC and elapsed_ms;
Phase1 poll exposes Redis-backed submit time and elapsed_ms_since_submit.
- Phase1 batch LLM splits segments by memoir_phase1_batch_llm_chunk_size with
bisect fallback per chunk; Playground shows server timings.
Made-with: Cursor
2026-04-09 13:39:04 +08:00
Kevin
064ad2161d
refactor(eval+memoir):精简内部评测路由与服务,composite/对话摘要与 judge 能力补强
...
- 访谈:新增 interview_state_hints,联动 orchestrator 与提示词
- 回忆录:story_pipeline_sync/state/memory/post_commit 与 Celery 任务调整
- 基建:开发用 celery broker、compose/development 脚本、依赖注入
- eval-web:移除数据集/实验/版本等页面与流式轮询,突出 Playground
- 文档与单测同步
2026-04-08 21:36:12 +08:00
Kevin
78b61c076e
feat(eval): Playground GLM 评分落库并可恢复
...
在 conversations 表增加 playground_conversation_judge_json,流式/非流式对话评审结束后写入最近一次快照(整体分、逐轮分、对比文案、错误与基线文件名等)。新增只读 GET 供前端按会话拉取;评测台 Playground 切换会话时自动恢复,并提示基线是否和当时一致。
2026-04-08 16:51:08 +08:00
Kevin
309a051038
feat: 回忆录证据血缘与内部评测可追溯,顺带对齐本地评测台与 CI
...
数据库与模型:新增多版迁移(章节证据快照、对话血缘、记忆事实/时间线 lineage 等),把「成稿 ↔ 对话/记忆」的溯源信息落到表结构里。
业务链路:会话与 WS、回忆录/故事流水线、记忆写入与 enrichment 等跟着接上线索与快照;新增章节证据快照与评测侧 EvalTraceService 等模块,方便组评审用的证据包。
内部评测:自动化 run 与手工 memoir 评审共用可追溯证据;rubric/ judge 相关脚本与文档有配套调整。
app-eval-web:Memoir/实验详情里能展开看证据摘要与 evidence_trace(含对话轮次 id);Vite 代理与 development.sh 注入的 API 端口与当前默认内部评测端口一致,避免改端口后页面连错服务。
工程杂项:GitHub Actions / 仓库说明有更新;各适配器与支付/配额/plan 等多处为小改动或跟随主改动的收尾;新增/扩充了?
2026-04-08 15:37:09 +08:00
Kevin
6772e1269c
feat(evaluation): memoir readiness, judge/replay updates, eval web playground
...
Add memoir_readiness_service and router tests; extend judge schemas/services, replay_service, and conversation rubric; align story route agent, payload, prompts, and story_pipeline_sync; update agent logging, config, and DI. Document internal-eval; add replayDraft util and PlaygroundPage changes in app-eval-web.
2026-04-08 09:43:34 +08:00
Kevin
99543d04c6
feat(eval): internal-eval stack, judge fixes, and eval web overhaul
...
- Merge internal-eval into development.sh (single Celery/infra); internal-eval.sh
wraps with LIFE_ECHO_WITH_INTERNAL_EVAL; EVAL_ATTACH_ONLY for attaching 8001
when :8000 is already up; document in api/docs/internal-eval.md.
- Evaluation: transcript_for_judge, judge error surfacing, rubric/schema tweaks,
execution_service and router updates; tests for judge and composite eval.
- Memory: ingest nested transaction for embedding/enrichment rollback safety.
- Conversation WS: logger.exception for pipeline errors (avoid loguru KeyError).
- app-eval-web: Playground saved replays, dialogue turns helper, hash user_id
for Memoir; Memoir chapter baseline↔DB row compare with title heuristics;
Stories page (#memoir-stories); Markdown + copy buttons; toolbar/panel UI;
react-markdown; development proxy and fixture updates.
2026-04-07 17:18:47 +08:00
Kevin
a50b72e7b5
feat(app-eval-web): 评测台 UI/UX 重构(侧栏导航、分页、数据集与实验能力)
...
- 采用 hash 路由与会话式壳层(Playground / Datasets / Experiments / Versions / Memoir)
- 抽取 api、types、hooks(轮询、通知、实验 SSE)与 NoticeContext
- Playground:基线/实际生成双栏、重放、流式自动评分与 ScoreCard
- Datasets:回归集与用例列表、Markdown/JSON 导入、会话快照
- Experiments:创建实验、提交运行、SSE 进度、DiffTable 与门禁展示
- 样式与无障碍:DM Sans + JetBrains Mono、侧栏响应式、? 快捷键帮助
2026-04-07 11:06:41 +08:00
Kevin
5972b0e721
feat(evaluation): 成稿 100 分 rubric、证据评审与评测台调整
...
- 回忆录细项上限收紧为合计 100 分,去掉 110 折算与 raw_dimension_total
- judge_memoir 拼接原始访谈与可选导出基线;无证据时提示保守打真实性相关分
- 自动评测 run 与手动章节/故事评审统一带 transcript 证据(会话/用户聚合、截断)
- 访谈打分仍为情绪强化版 15 细项、总分 100
- 评测台默认基准改为 zuckxu 导出 MD;移除逐轮用户句对齐表及相关逻辑
- 新增 judge schema 与 memoir prompt 组装的单元测试
2026-04-07 10:36:22 +08:00
Kevin
29dec8fe32
feat/ eval
2026-04-06 23:19:20 +08:00
Kevin
ca8bcc8489
feat(evaluation): session catalog, user export import, and eval web UI
...
- Extend evaluation API: schemas, router, repo, admin and execution services
- Improve user export markdown importer; add fixtures and importer tests
- Session catalog repo/service updates; internal app wiring and docs
- Add internal-eval.sh helper; refresh app-eval-web (App, styles, Vite)
2026-04-06 13:49:28 +08:00
Kevin
b75edacb5f
feat/ 导出开发容器内的数据用于评估
2026-04-03 14:44:46 +08:00