重构回忆录为 story-first / markdown-first 架构并整合图片意图与前端 UI 修复

本次 squash merge 将 codex-story-first-image-intent 的整体改动合入 development，核心内容包括： 1. 后端数据与迁移：新增 stories、story_versions、story_image_intents、chapter_cover_intents、assets 等模型与 Alembic 迁移，建立 story-first、markdown-first、asset-first 的主数据链路。 2. 生成与任务链：引入 StoryBuilderOrchestrator、ChapterComposerOrchestrator、story_image_tasks、chapter_cover_tasks，图片生成从正文占位符改为结构化 intent -> asset -> markdown 回填。 3. 并发与一致性：为 story/chapter intent 增加 claim_token、claimed_at、attempt_count，采用数据库原子 claim 为主、Redis 锁为辅，避免重复生成、锁误删和 processing 卡死。 4. Memoir 读写路径：章节 canonical_markdown 成为正文真源，列表/详情接口补齐 markdown、cover_asset、word_count 等字段，PDF 与 asset 解析链路同步升级。 5. Memory / Retrieval：扩展 transcript ingest、chunking、evidence 检索与 story 聚合基础设施，为后续 story-first RAG 与多 agent 编排提供底座。 6. App 端体验：章节页继续走 MarkdownRenderer 阅读链，同时吸收 fix3-19 的跨平台 UI glitch 修复；更新对话页、首页、文案资源与章节列表映射逻辑。 7. 测试与文档：补充 asset resolver、story image task、章节封面派发、markdown 映射等回归测试，并加入图片占位符退役设计文档。
2026-03-20 10:30:07 +08:00
parent 13e3124b85
commit 7f57f96c25
67 changed files with 4751 additions and 832 deletions
--- a/api/app/features/memory/chunker.py
+++ b/api/app/features/memory/chunker.py
@@ -1,8 +1,38 @@
-"""Transcript chunker — split raw text into retrieval-ready chunks (skeleton)."""
+"""Transcript chunker — split raw text into retrieval-ready chunks."""
+
+import re


 def chunk_transcript(
-    text: str, *, max_tokens: int = 512, overlap: int = 64
+    text: str, *, max_chars: int = 800, overlap_chars: int = 100
 ) -> list[str]:
-    """Split transcript text into overlapping chunks."""
-    raise NotImplementedError
+    """
+    Split transcript text into overlapping chunks.
+    Uses character count as proxy for tokens (~4 chars/token for Chinese).
+    """
+    if not text or not text.strip():
+        return []
+    text = text.strip()
+    if len(text) <= max_chars:
+        return [text] if text else []
+
+    chunks: list[str] = []
+    start = 0
+    step = max_chars - overlap_chars
+
+    while start < len(text):
+        end = start + max_chars
+        chunk = text[start:end]
+        # 尽量在句末切分
+        if end < len(text):
+            for sep in ["。", "！", "？", "\n", "；", ".", "!", "?"]:
+                last_sep = chunk.rfind(sep)
+                if last_sep > max_chars // 2:
+                    chunk = chunk[: last_sep + 1]
+                    end = start + len(chunk)
+                    break
+        if chunk.strip():
+            chunks.append(chunk.strip())
+        start += len(chunk) if chunk else step
+
+    return chunks