Files
life-echo/api/docs/memoir_reliability.md
Kevin 71fbd39e32 feat(api)!: memory single chain — async MemoryService, strict eval closure
Route all memory ingest/retrieve/enrichment/compaction through async MemoryService.
Remove legacy sync memory implementations (ingest/retrieve/compaction); Celery and
memoir Phase2 call asyncio.run into MemoryService-backed helpers.

Memoir Phase1 batch ingest uses MemoryService.ingest_transcripts_batch; drop chapters.
evidence_bundle_json mirror (Alembic 0015). Evaluation uses snapshot/link-only bundles;
raise EvidenceClosureMissing instead of partial/fallback lineage tiers.

Split memoir state into NarrativeCoverageState and InterviewControlState; delete the
_interview_meta_store adapter layer. Remove rolling-query and recent-fact fallback
settings from config and evidence assembly.

Update judges, docs, tests, and PlaygroundPage alignment.

Made-with: Cursor
2026-04-30 14:11:50 +08:00

50 lines
3.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Memoir & memory reliability
This document summarizes production-oriented behavior for the memoir narrative pipeline, memory evidence, compaction, and async orchestration.
## Correlation ID (`memoir_correlation_id`)
- Phase 1 (`process_memoir_phase1`) generates a UUID at task start and logs `event=memoir_phase1_* … memoir_correlation_id=`.
- Phase 2 receives it via Celery `kwargs` and combines with `effective_correlation_id` (explicit id wins, else Celery task id).
- The same id is passed into `run_story_pipeline_for_category_batch`, structured logs, and `compaction_extra` when scheduling memory compaction after Phase 2.
## Feature flags (`app.core.config.Settings`)
| Flag | Default | Purpose |
|------|---------|---------|
| `memoir_fidelity_fail_open_on_parse_error` | `False` | When `True`, fidelity JSON/LLM failures pass the gate even for new stories (rollback only via ops need). |
| `memoir_narrative_evidence_overlap_min_chars` | `14` | Deterministic overlap check between body and evidence plain text. |
| `memoir_title_slots_require_body_or_oral_match` | `True` | Narrows title-generation slot inputs to body/oral overlap. |
| `memory_compaction_enabled` | `True` | Near-duplicate chunk soft-exclude; requires Celery worker + **Beat** for periodic `memory_compaction_sweep`. |
| `memoir_recompose_retry_on_lock_contention` | `True` | Chapter recompose retries with backoff when the chapter pipeline lock is held. |
| `memoir_phase2_singleflight_immediate` | `True` | Immediate Phase 2 `send_task` uses a stable `task_id` per user/category to reduce duplicate queue entries. |
| `chapter_pipeline_lock_ttl_seconds` | `360` | Shared lock TTL for Phase 2 and `recompose_chapter`; tune with longest expected runtimes. |
## Memory compaction → facts
When a chunk is soft-excluded as a near-duplicate loser, `mark_facts_stale_for_excluded_chunk_sync` sets linked `MemoryFact` rows (`source_chunk_id`, statuses `confirmed`/`candidate`) to **`stale`**. Downstream fact retrieval uses `confirmed` only for default search/browse paths.
## Acceptance-oriented metrics (log queries)
Monitor structured log events:
- `event=fidelity_parse_fail_closed` / `fidelity_check_fail`
- `event=memoir_phase2_*` with `memoir_correlation_id`
- `memory_compaction_exclude` / `memory_compaction_facts_staled`
- `event=recompose_chapter status=lock_busy_retry`
## Tests
Targeted regressions live under `api/tests/`:
- `test_fidelity_gate.py`, `test_narrative_boundary_regressions.py`
- `test_memory_consistency_rules.py`, `test_memoir_idempotency.py`
- `test_recompose_retry_policy.py`
- `test_llm_json_call.py`, `test_stage_slot_registry.py`
## LLM JSON (`llm_json_call`) and compat strip
- Standard path: `response_format=json_object``json.loads` → Pydantic validate.
- On decode failure only, `extract_json_payload` runs once (fence / brace strip). A hit emits **`event=llm_json_compat_strip_hit`** at WARNING.
- **Step 13 (sunset)**: observe this event in production for ~12 weeks; if zero hits, remove the compat branch from `app.core.llm_call` and migrate remaining callers off `extract_json_payload` for JSON-mode paths.