Files

Kevin 41518bda11 聊天和回忆录证据检索都走 pgvector，去掉 Postgres FTS/content_tsv，新迁移删掉 content_tsv 列（部署要先 alembic upgrade）。

Embedding 端口增加 is_available()，聊天和回忆录日志用统一方式表示向量是否真能调用。

记忆整理（compaction）支持 Beat 定期扫用户；

事实抽取提示与 subject 归一化，减少同一人多种称呼；

2026-04-03 11:43:16 +08:00

2.6 KiB

Raw Blame History

记忆检索：异步 API 与 Celery 同步路径

两条路径

路径	入口	检索能力
异步（HTTP / MemoirService）	`MemoryService.retrieve` → `HybridRetriever` → `evidence.retrieve_evidence_bundle_async`	向量（pgvector） chunks；facts / timeline 按 query ILIKE，无命中则 fallback 最近条；rolling + ILIKE 摘要；stories（标题/摘要匹配）
同步（Celery）	`retrieve_evidence_sync`（注入 `get_embedding_provider()` → `evidence.retrieve_evidence_bundle_sync`）	向量 chunks + 同上元数据；与异步路径对齐

证据组装在 app/features/memory/evidence.py；memory/repo 提供原子查询（chunk 向量、facts/timeline 搜索、摘要列表等），story 合并在 evidence 层完成。

依赖 embedding

未配置 ZHIPU_API_KEY（或 provider _client 为空）时，chunk 检索为空列表，仍会返回 facts/timeline/summaries/stories（按 query ILIKE）。
日志：HybridRetriever / retrieve_evidence_bundle_sync 在无 provider 或空向量时会打 warning。

空 query

默认：relevant_* 均为空（与历史行为一致）。
若设置 memory_evidence_empty_query_include_rolling=true：返回无 chunk，但含 rolling 摘要、最近 facts / timeline（用于「浏览」模式）。

富化（ingest 后 LLM）

memory_enrichment_enabled（默认 true）：ingest_transcript / ingest_transcript_sync 后执行摘要、事实、时间线；false 时跳过。
memory_enrichment_max_chars：截断送入 LLM 的文本长度。
同一 memory_source_id 的时间线在重跑富化前会先删后插入，避免重复事件。
Ingest 写入 embedding（best-effort）；历史 FTS 列 content_tsv 已由迁移 0007_drop_chunk_content_tsv 删除。

Celery 任务中的顺序

process_memoir_segments（app/tasks/memoir_tasks.py）在同一任务内先执行 ingest_transcript_sync（并 commit），再执行 MemoirOrchestrator 与 run_story_pipeline_for_category_batch。因此 retrieve_evidence_sync 能看到本批刚写入的 memory chunks（无竞态），前提是 embedding API 已成功写入向量。

章节分类上，若模型返回 none 或命中零散档案启发式，Story 侧会统一落入 summary 章节并继续叙事落库，与「本批 transcript 已进 memory」一致，避免误以为内容被丢弃。

Evidence 与叙事 Prompt

format_evidence_chunks_for_prompt 拼接 chunks、摘要（若有）、facts、timeline、故事摘要（若有）；模型应把摘录视为参考材料，非本段口述。

2.6 KiB Raw Blame History Unescape Escape