feat(eval): memoir A/B chapter judging and eval-web parity with dialogue

- Judge baseline excerpt and library chapter separately; build_memoir_compare_summary for gate, nine-dim and leaf deltas.

- Memoir SSE chapter payload: baseline_judge, compare_summary, baseline_judge_error.

- MemoirJudgeOutput: loose score coercion and post-validate clamp; memoir judge prompt caps from settings.

- app-eval-web: two-column MemoirScoreCard layout, MemoirCompareSummary, chapter blocks and CSS.

- Add memoir_compare_summary, log_events, celery_log_context, memoir_pipeline_progress; tests and migration 0014.

- Misc: memory/evidence and enrichment paths, task/orchestrator updates, internal-eval docs, env examples.
This commit is contained in:
Kevin
2026-04-10 10:23:43 +08:00
parent b0251e5b26
commit ac49bc7f23
59 changed files with 4773 additions and 696 deletions

View File

@@ -390,6 +390,66 @@ code {
color: var(--text);
}
/* ── Phase tag (inline badge in meta bar) ── */
.eval-memoir-phase-tag {
display: inline-block;
margin-left: var(--s-3);
padding: 0.15em 0.55em;
font-size: var(--text-xs);
font-weight: 600;
border-radius: var(--r-sm);
vertical-align: middle;
}
.eval-memoir-phase-tag--active {
background: var(--accent-muted);
color: var(--accent);
animation: eval-memoir-pulse 1.4s ease-in-out infinite;
}
.eval-memoir-phase-tag--done {
background: var(--success-bg);
color: var(--success-text);
}
.eval-memoir-phase-tag--error {
background: var(--danger-bg);
color: var(--danger-text);
}
@keyframes eval-memoir-pulse {
0%, 100% { opacity: 1; }
50% { opacity: .55; }
}
/* ── Progress bar ── */
.eval-memoir-progress {
height: 4px;
margin: 0 0 var(--s-3);
border-radius: 2px;
background: var(--bg-muted);
overflow: hidden;
}
.eval-memoir-progress__bar {
height: 100%;
background: var(--accent);
border-radius: 2px;
transition: width 0.35s ease;
}
/* ── Danger button ── */
.eval-btn--danger {
background: var(--danger-bg);
border-color: var(--danger-border);
color: var(--danger-text);
font-weight: 600;
}
.eval-btn--danger:hover:not(:disabled) {
background: oklch(0.94 0.04 18);
}
/* ── Raw JSON details toggle ── */
.eval-memoir-raw-detail summary {
font-size: var(--text-sm);
user-select: none;
}
.eval-memoir-compare {
display: grid;
grid-template-columns: 1fr 1fr;
@@ -1892,6 +1952,18 @@ code {
margin-bottom: var(--s-2);
}
.eval-memoir-chapter-block {
border: 1px solid var(--border);
border-radius: var(--r-lg);
padding: var(--s-4);
margin-bottom: var(--s-4);
background: var(--bg-elevated);
}
.eval-memoir-compare-section {
margin-top: var(--s-3);
}
/* Diff table */
.eval-diff-wrap {