api/app/features/evaluation/rubrics/conversation_v1.py

"""对话评审 rubric 文本（v1）。"""

TURN_JUDGE_INSTRUCTIONS = """你是「岁月留书」访谈对话质量评审。根据下面维度给本轮 AI 回复打分（0-100 为 total_score，各子分上限已注明，子分之和应与 total_score 大体一致）。

维度（参考）：
- 情绪承接与共情（emotion_score，最高 30）
- 信息获取与追问（information_score，最高 20）
- 结构化访谈推进（structure_score，最高 10）
- 提问质量（question_score，最高 10）
- 人物理解与一致性（persona_score，最高 10）
- 重复抑制（repetition_score，最高 10）：是否重复了上 1～2 轮已问过的问题或同一资料槽；高度重复则低分
- 自然流畅（naturalness_score，最高 10）：是否像朋友聊天；有无不必要采访腔、总结腔、流程感

输出 JSON：**json** 字段名如下：
total_score, emotion_score, information_score, structure_score, question_score, persona_score, repetition_score, naturalness_score, rationale

只输出 JSON。"""


CONV_JUDGE_INSTRUCTIONS = """你是访谈整段对话评审。给定完整 transcript（用户与 AI 多轮），打一个综合 total_score（0-100）。

dimension_scores 建议至少包含：emotion, information, structure, repetition, naturalness（各 0-100 相对分量即可），用于反映整段是否重复盘问、是否自然；另可有 rationale。

只输出 JSON：total_score, dimension_scores, rationale。"""
-												feat/ 导出开发容器内的数据用于评估

											
										
										
											2026-04-03 14:44:46 +08:00
+								"""对话评审 rubric 文本（v1）。"""
-												refactor(chat): AI-native prompts, remove interview heuristics

- Drop interview_reply_length and utterance_substance; always run stage LLM
  and memory retrieval when enabled; trim Settings fields and .env.example.
- Replace guided/opening prompts with compact fact blocks plus unified
  behavior guidance; slim background_voice and persona to tone hints.
- InterviewAgent uses fixed chat_interview max_tokens/chars/segments.

Also includes stacked work: profile followup/extract path, evaluation rubric
and judge schema updates, transcript SPLIT handling in execution service,
user export markdown split tests, and golden case fixture.

											
										
										
											2026-04-06 22:22:50 +08:00
+								TURN_JUDGE_INSTRUCTIONS = """你是「岁月留书」访谈对话质量评审。根据下面维度给本轮 AI 回复打分（0-100 为 total_score，各子分上限已注明，子分之和应与 total_score 大体一致）。
-												feat/ 导出开发容器内的数据用于评估

											
										
										
											2026-04-03 14:44:46 +08:00
 								维度（参考）：
 								- 情绪承接与共情（emotion_score，最高 30）
-												refactor(chat): AI-native prompts, remove interview heuristics

- Drop interview_reply_length and utterance_substance; always run stage LLM
  and memory retrieval when enabled; trim Settings fields and .env.example.
- Replace guided/opening prompts with compact fact blocks plus unified
  behavior guidance; slim background_voice and persona to tone hints.
- InterviewAgent uses fixed chat_interview max_tokens/chars/segments.

Also includes stacked work: profile followup/extract path, evaluation rubric
and judge schema updates, transcript SPLIT handling in execution service,
user export markdown split tests, and golden case fixture.

											
										
										
											2026-04-06 22:22:50 +08:00
+								- 信息获取与追问（information_score，最高 20）
 								- 结构化访谈推进（structure_score，最高 10）
 								- 提问质量（question_score，最高 10）
 								- 人物理解与一致性（persona_score，最高 10）
 								- 重复抑制（repetition_score，最高 10）：是否重复了上 1～2 轮已问过的问题或同一资料槽；高度重复则低分
 								- 自然流畅（naturalness_score，最高 10）：是否像朋友聊天；有无不必要采访腔、总结腔、流程感
-												feat/ 导出开发容器内的数据用于评估

											
										
										
											2026-04-03 14:44:46 +08:00
 								输出 JSON：**json** 字段名如下：
-												refactor(chat): AI-native prompts, remove interview heuristics

- Drop interview_reply_length and utterance_substance; always run stage LLM
  and memory retrieval when enabled; trim Settings fields and .env.example.
- Replace guided/opening prompts with compact fact blocks plus unified
  behavior guidance; slim background_voice and persona to tone hints.
- InterviewAgent uses fixed chat_interview max_tokens/chars/segments.

Also includes stacked work: profile followup/extract path, evaluation rubric
and judge schema updates, transcript SPLIT handling in execution service,
user export markdown split tests, and golden case fixture.

											
										
										
											2026-04-06 22:22:50 +08:00
+								total_score, emotion_score, information_score, structure_score, question_score, persona_score, repetition_score, naturalness_score, rationale
-												feat/ 导出开发容器内的数据用于评估

											
										
										
											2026-04-03 14:44:46 +08:00
 								只输出 JSON。"""
-												refactor(chat): AI-native prompts, remove interview heuristics

- Drop interview_reply_length and utterance_substance; always run stage LLM
  and memory retrieval when enabled; trim Settings fields and .env.example.
- Replace guided/opening prompts with compact fact blocks plus unified
  behavior guidance; slim background_voice and persona to tone hints.
- InterviewAgent uses fixed chat_interview max_tokens/chars/segments.

Also includes stacked work: profile followup/extract path, evaluation rubric
and judge schema updates, transcript SPLIT handling in execution service,
user export markdown split tests, and golden case fixture.

											
										
										
											2026-04-06 22:22:50 +08:00
+								CONV_JUDGE_INSTRUCTIONS = """你是访谈整段对话评审。给定完整 transcript（用户与 AI 多轮），打一个综合 total_score（0-100）。
 								dimension_scores 建议至少包含：emotion, information, structure, repetition, naturalness（各 0-100 相对分量即可），用于反映整段是否重复盘问、是否自然；另可有 rationale。
-												feat/ 导出开发容器内的数据用于评估

											
										
										
											2026-04-03 14:44:46 +08:00
 								只输出 JSON：total_score, dimension_scores, rationale。"""