feat(voice-client): PySide6 desktop client and Windows build scripts

Add voice_confirmation_client (poll, TTS MP3 playback, mic WAV resolve), PyInstaller spec, start/build helpers, and API unit tests. Pending manual testing: end-to-end on OR workstations and packaged exe. Made-with: Cursor
2026-04-27 09:52:10 +08:00
parent e4c6127619
commit 4c3f9a367b
19 changed files with 1324 additions and 0 deletions
--- a/voice_confirmation_client/README.md
+++ b/voice_confirmation_client/README.md
@@ -0,0 +1,80 @@
+# 手术室耗材语音确认客户端（桌面版）
+
+独立桌面程序：按可配置间隔（默认 **5 秒**）轮询 `GET /client/surgeries/{surgery_id}/pending-confirmation`，播放服务端返回的 **MP3 话术**，录制医生麦克风为 **16 kHz 单声道 WAV**，并调用 `POST .../pending-confirmation/{confirmation_id}/resolve`（`multipart` 字段名 `audio`）。协议与 `[docs/客户端手术通信接口说明.md](../docs/客户端手术通信接口说明.md)` 一致。
+
+## 环境
+
+- Python **3.13+**（与主项目一致）
+- 安装可选依赖组 `**voice-client`**（PySide6、httpx、numpy、sounddevice）
+
+```bash
+cd /path/to/operation-room-monitor-server
+uv sync --group voice-client
+```
+
+## 运行（开发态）
+
+未配置项目 `build-system` 时，`uv` 可能不会注册 `voice-confirmation-client` 命令，推荐：
+
+```bash
+./start_voice_confirmation_client.sh
+```
+
+或在仓库根目录：
+
+```bash
+uv run --group voice-client python -m voice_confirmation_client
+```
+
+Windows（仓库根目录）：
+
+```bat
+start_voice_confirmation_client.bat
+```
+
+若 entry point 已可用，也可：
+
+```bash
+uv run --group voice-client voice-confirmation-client
+```
+
+在界面中填写 **服务端 Base URL**、**6 位手术号**，点击 **开始监控**。
+
+## 音频说明
+
+- **播放 MP3**：优先使用本机 `ffplay`（ffmpeg），其次 macOS 使用 `afplay`；可将 `ffplay` 放到 `voice_confirmation_client/bin/`（与包同级目录下的 `bin/`）以便离线环境使用。
+- **录音**：默认使用 **sounddevice** 录制并重采样为 16 kHz 单声道 WAV（与浏览器 Demo 一致）。可选勾选 **优先使用 ffmpeg 录音**（依赖本机 ffmpeg 及可用的设备参数；Windows 默认设备名可能需按现场调整，见 `voice_confirmation_client/core/record.py` 中 `default_ffmpeg_input_args`）。
+
+## 打包（PyInstaller）
+
+在 **目标操作系统** 上构建（不要交叉编译 Qt 桌面程序）。
+
+```bash
+uv sync --group voice-client-build
+uv run --group voice-client-build pyinstaller voice_client.spec --noconfirm
+# 或
+uv run --group voice-client-build python scripts/build_voice_client.py
+```
+
+**Windows 一键打包（仓库根目录）**：双击或在 `cmd` 中执行 `build_voice_confirmation_client.bat`；需要干净构建时加参数 `--clean`（会先删除 `build/`、`dist/`）。
+
+产物目录：`dist/voice-confirmation-client/`（目录分发，内含可执行文件）。Windows 下可执行文件为 `voice-confirmation-client.exe`。
+
+**说明**：
+
+- 体积较大（含 PySide6）；杀毒软件可能对 PyInstaller 打包的 exe 误报，可向医院 IT 申请加白。
+- **macOS**：未签名/未公证的 `.app` 可能需在「隐私与安全性」中手动允许；正式发布需 Apple 开发者签名与公证。
+- **可选**：将 `ffmpeg`/`ffplay` 二进制放入打包目录下的 `voice_confirmation_bin/`，程序会优先使用（需在 spec 中增加 `datas` 将该目录打入包内，或手动复制到分发目录）。
+
+## 术间排查
+
+1. **网络**：客户端机器能访问监控服务 HTTP/HTTPS 端口（默认文档为 `38080`）。
+2. **麦克风**：在「输入设备」中选择正确设备；无列表时检查系统隐私权限（麦克风）。
+3. **无待确认**：轮询返回 404 为常态；可关闭「隐藏 404 轮询日志」观察请求节奏。
+4. **解析失败**：使用 **重试本轮** 重新播放 + 录音 + 上传；或使用 **仅重播话术** 听清提示。
+
+## 与浏览器 Demo 的差异
+
+- 浏览器 Demo（`scripts/demo_client/`）默认 **10 秒** 轮询；本客户端默认 **5 秒**，可在界面修改。
+- 本客户端无「开始/结束手术」按钮；手术需由既有流程或他端调用 `POST /client/surgeries/start` 启动。
+
--- a/voice_confirmation_client/init.py
+++ b/voice_confirmation_client/init.py
@@ -0,0 +1,3 @@
+"""Desktop voice confirmation client for OR monitor API (pending-confirmation loop)."""
+
+__version__ = "0.1.0"
--- a/voice_confirmation_client/main.py
+++ b/voice_confirmation_client/main.py
@@ -0,0 +1,20 @@
+"""Entry: `python -m voice_confirmation_client` or `voice-confirmation-client`."""
+
+from __future__ import annotations
+
+import sys
+
+
+def main() -> None:
+    from PySide6.QtWidgets import QApplication
+
+    from voice_confirmation_client.gui.main_window import MainWindow
+
+    app = QApplication(sys.argv)
+    win = MainWindow()
+    win.show()
+    raise SystemExit(app.exec())
+
+
+if __name__ == "__main__":
+    main()
--- a/voice_confirmation_client/core/init.py
+++ b/voice_confirmation_client/core/init.py
@@ -0,0 +1,3 @@
+from voice_confirmation_client.core.monitor_worker import MonitorWorker
+
+__all__ = ["MonitorWorker"]
--- a/voice_confirmation_client/core/api.py
+++ b/voice_confirmation_client/core/api.py
@@ -0,0 +1,87 @@
+"""HTTP client for pending-confirmation and resolve endpoints."""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass
+from typing import Any
+from urllib.parse import quote, urljoin
+
+import httpx
+
+
+@dataclass
+class PendingConfirmationPayload:
+    surgery_id: str
+    confirmation_id: str
+    prompt_text: str
+    prompt_audio_mp3_base64: str
+    options: list[dict[str, Any]]
+    model_top1_label: str
+    model_top1_confidence: float
+    created_at: str
+    raw: dict[str, Any]
+
+
+class ConfirmationApiClient:
+    def __init__(self, base_url: str, timeout: float = 60.0) -> None:
+        self._base = base_url.rstrip("/") + "/"
+        self._timeout = timeout
+        self._client = httpx.Client(timeout=timeout)
+
+    @property
+    def base_url_normalized(self) -> str:
+        return self._base
+
+    def close(self) -> None:
+        self._client.close()
+
+    def _url(self, path: str) -> str:
+        return urljoin(self._base, path.lstrip("/"))
+
+    def get_pending(self, surgery_id: str) -> tuple[int, dict[str, Any] | str]:
+        url = self._url(f"client/surgeries/{surgery_id}/pending-confirmation")
+        r = self._client.get(url)
+        text = r.text
+        if not text:
+            return r.status_code, {}
+        try:
+            body: dict[str, Any] | str = json.loads(text)
+        except json.JSONDecodeError:
+            body = text
+        return r.status_code, body
+
+    def parse_pending(self, body: dict[str, Any]) -> PendingConfirmationPayload:
+        return PendingConfirmationPayload(
+            surgery_id=str(body.get("surgery_id", "")),
+            confirmation_id=str(body["confirmation_id"]),
+            prompt_text=str(body.get("prompt_text", "")),
+            prompt_audio_mp3_base64=str(body.get("prompt_audio_mp3_base64", "")),
+            options=list(body.get("options") or []),
+            model_top1_label=str(body.get("model_top1_label", "")),
+            model_top1_confidence=float(body.get("model_top1_confidence", 0.0)),
+            created_at=str(body.get("created_at", "")),
+            raw=body,
+        )
+
+    def post_resolve(
+        self,
+        surgery_id: str,
+        confirmation_id: str,
+        wav_bytes: bytes,
+        filename: str = "voice.wav",
+    ) -> tuple[int, dict[str, Any] | str]:
+        cid_enc = quote(confirmation_id, safe="")
+        url = self._url(
+            f"client/surgeries/{surgery_id}/pending-confirmation/{cid_enc}/resolve"
+        )
+        files = {"audio": (filename, wav_bytes, "audio/wav")}
+        r = self._client.post(url, files=files)
+        text = r.text
+        if not text:
+            return r.status_code, {}
+        try:
+            body: dict[str, Any] | str = json.loads(text)
+        except json.JSONDecodeError:
+            body = text
+        return r.status_code, body
--- a/voice_confirmation_client/core/monitor_worker.py
+++ b/voice_confirmation_client/core/monitor_worker.py
@@ -0,0 +1,347 @@
+"""Background polling + play + record + resolve (threaded, Qt-free)."""
+
+from __future__ import annotations
+
+import re
+import threading
+import time
+from collections.abc import Callable
+from dataclasses import dataclass, field
+from typing import Any
+
+from voice_confirmation_client.core.api import ConfirmationApiClient
+from voice_confirmation_client.core.playback import play_mp3_from_base64
+from voice_confirmation_client.core.record import record_wav_16k_mono
+
+
+@dataclass
+class MonitorSettings:
+    base_url: str = "http://127.0.0.1:38080"
+    surgery_id: str = ""
+    interval_sec: float = 5.0
+    record_seconds: float = 8.0
+    dry_run: bool = False
+    hide_404_logs: bool = True
+    prefer_ffmpeg_record: bool = False
+    sounddevice_device: int | str | None = None
+
+
+@dataclass
+class _MutableState:
+    generation: int = 0
+    busy: bool = False
+    spoken_cid: str | None = None
+    failed_resolve_cid: str | None = None
+    force_retry: bool = False
+    last_payload: dict[str, Any] | None = None
+
+
+class MonitorWorker:
+    """Polls pending-confirmation; on new item plays MP3, records WAV, POSTs resolve."""
+
+    def __init__(
+        self,
+        *,
+        on_log: Callable[[str], None] | None = None,
+        on_state: Callable[[str], None] | None = None,
+        on_pending: Callable[[dict[str, Any] | None], None] | None = None,
+    ) -> None:
+        self._on_log = on_log
+        self._on_state = on_state
+        self._on_pending = on_pending
+        self._settings = MonitorSettings()
+        self._settings_lock = threading.Lock()
+        self._state = _MutableState()
+        self._state_lock = threading.Lock()
+        self._stop = threading.Event()
+        self._wake = threading.Event()
+        self._monitoring = threading.Event()
+        self._thread: threading.Thread | None = None
+        self._api: ConfirmationApiClient | None = None
+        self._api_base: str | None = None
+        self._api_lock = threading.Lock()
+
+    def set_settings(self, **kwargs: Any) -> None:
+        with self._settings_lock:
+            old_sid = self._settings.surgery_id
+            for k, v in kwargs.items():
+                if hasattr(self._settings, k):
+                    setattr(self._settings, k, v)
+            sid_changed = (
+                "surgery_id" in kwargs and self._settings.surgery_id != old_sid
+            )
+        with self._state_lock:
+            self._state.generation += 1
+            if sid_changed:
+                self._state.spoken_cid = None
+                self._state.failed_resolve_cid = None
+                self._state.last_payload = None
+                self._state.force_retry = False
+                self._emit_pending(None)
+
+    def start_thread(self) -> None:
+        if self._thread and self._thread.is_alive():
+            return
+        self._stop.clear()
+        self._thread = threading.Thread(target=self._run, name="VoiceMonitor", daemon=True)
+        self._thread.start()
+
+    def stop_thread(self) -> None:
+        self._stop.set()
+        self._wake.set()
+        if self._thread:
+            self._thread.join(timeout=8.0)
+            self._thread = None
+        with self._api_lock:
+            if self._api:
+                self._api.close()
+                self._api = None
+            self._api_base = None
+
+    def set_monitoring(self, active: bool) -> None:
+        if active:
+            self._monitoring.set()
+            self._wake.set()
+        else:
+            self._monitoring.clear()
+            with self._state_lock:
+                self._state.generation += 1
+
+    def retry_failed(self) -> None:
+        with self._state_lock:
+            self._state.force_retry = True
+        self._wake.set()
+
+    def replay_prompt_only(self) -> None:
+        """Play last pending MP3 again (GUI button); no record/upload."""
+        threading.Thread(target=self._replay_prompt_job, name="ReplayPrompt", daemon=True).start()
+
+    def _replay_prompt_job(self) -> None:
+        with self._state_lock:
+            payload = self._state.last_payload
+        if not payload:
+            self._log("没有可重播的待确认数据")
+            return
+        b64 = payload.get("prompt_audio_mp3_base64") or ""
+        if not b64:
+            self._log("当前任务无 MP3 数据")
+            return
+        self._emit_state("播放话术（手动重播）…")
+        try:
+            play_mp3_from_base64(str(b64))
+        except Exception as e:
+            self._log(f"重播失败: {e}")
+        finally:
+            self._emit_state("待机")
+
+    def _log(self, msg: str) -> None:
+        if self._on_log:
+            self._on_log(msg)
+
+    def _emit_state(self, s: str) -> None:
+        if self._on_state:
+            self._on_state(s)
+
+    def _emit_pending(self, p: dict[str, Any] | None) -> None:
+        if self._on_pending:
+            self._on_pending(p)
+
+    def _get_api(self, base_url: str) -> ConfirmationApiClient:
+        norm = base_url.rstrip("/") + "/"
+        with self._api_lock:
+            if self._api is None or self._api_base != norm:
+                if self._api:
+                    self._api.close()
+                self._api = ConfirmationApiClient(base_url)
+                self._api_base = norm
+            return self._api
+
+    def _run(self) -> None:
+        while not self._stop.is_set():
+            if not self._monitoring.is_set():
+                time.sleep(0.15)
+                continue
+
+            with self._settings_lock:
+                cfg = MonitorSettings(
+                    base_url=self._settings.base_url,
+                    surgery_id=self._settings.surgery_id,
+                    interval_sec=self._settings.interval_sec,
+                    record_seconds=self._settings.record_seconds,
+                    dry_run=self._settings.dry_run,
+                    hide_404_logs=self._settings.hide_404_logs,
+                    prefer_ffmpeg_record=self._settings.prefer_ffmpeg_record,
+                    sounddevice_device=self._settings.sounddevice_device,
+                )
+
+            if not re.fullmatch(r"\d{6}", cfg.surgery_id or ""):
+                self._emit_state("手术号无效（需 6 位数字）")
+                self._wake.wait(timeout=1.0)
+                self._wake.clear()
+                continue
+
+            api = self._get_api(cfg.base_url)
+
+            with self._state_lock:
+                if self._state.busy:
+                    self._wake.wait(timeout=0.5)
+                    self._wake.clear()
+                    continue
+                gen_before = self._state.generation
+
+            try:
+                status, body = api.get_pending(cfg.surgery_id)
+            except Exception as e:
+                self._log(f"GET pending 失败: {e}")
+                self._wait_interval(cfg.interval_sec)
+                continue
+
+            with self._state_lock:
+                if self._state.generation != gen_before:
+                    continue
+                if self._state.busy:
+                    continue
+
+            if status == 404:
+                with self._state_lock:
+                    self._state.last_payload = None
+                    self._state.spoken_cid = None
+                    self._state.failed_resolve_cid = None
+                self._emit_pending(None)
+                if not cfg.hide_404_logs:
+                    self._log("暂无待确认")
+                self._emit_state("轮询中（无待确认）")
+                self._wait_interval(cfg.interval_sec)
+                continue
+
+            if status != 200 or not isinstance(body, dict):
+                self._log(f"GET pending 异常 HTTP {status}: {body}")
+                self._wait_interval(cfg.interval_sec)
+                continue
+
+            cid = str(body.get("confirmation_id") or "")
+            if not cid:
+                self._wait_interval(cfg.interval_sec)
+                continue
+
+            with self._state_lock:
+                self._state.last_payload = body
+                failed = self._state.failed_resolve_cid
+                force = self._state.force_retry
+                spoken = self._state.spoken_cid
+
+                if failed is not None and failed != cid:
+                    self._state.failed_resolve_cid = None
+                    self._state.force_retry = False
+                    failed = None
+
+                if failed == cid and not force:
+                    self._emit_pending(body)
+                    self._wait_interval(cfg.interval_sec)
+                    continue
+
+                if spoken == cid and failed is None and not force:
+                    # Already completed pipeline for this cid without failure; server still returns same id?
+                    self._emit_pending(body)
+                    self._wait_interval(cfg.interval_sec)
+                    continue
+
+                self._state.force_retry = False
+                self._state.busy = True
+                self._state.spoken_cid = cid
+
+            self._emit_pending(body)
+
+            try:
+                self._pipeline_play_record_resolve(cfg, api, body, cid)
+            finally:
+                with self._state_lock:
+                    self._state.busy = False
+
+            self._wake.clear()
+            self._wait_interval(cfg.interval_sec)
+
+    def _wait_interval(self, interval_sec: float) -> None:
+        self._wake.wait(timeout=max(0.5, interval_sec))
+        self._wake.clear()
+
+    def _pipeline_play_record_resolve(
+        self,
+        cfg: MonitorSettings,
+        api: ConfirmationApiClient,
+        body: dict[str, Any],
+        cid: str,
+    ) -> None:
+        gen_lock = self._state_lock
+        with gen_lock:
+            gen_run = self._state.generation
+
+        try:
+            self._emit_state("播放话术…")
+            play_mp3_from_base64(str(body.get("prompt_audio_mp3_base64") or ""))
+        except Exception as e:
+            self._log(f"播放失败: {e}")
+            with gen_lock:
+                self._state.failed_resolve_cid = cid
+            self._emit_state("播放失败（可重试）")
+            return
+
+        with gen_lock:
+            if self._state.generation != gen_run:
+                return
+
+        try:
+            self._emit_state("录音中…")
+            wav = record_wav_16k_mono(
+                cfg.record_seconds,
+                device=cfg.sounddevice_device,
+                prefer_ffmpeg=cfg.prefer_ffmpeg_record,
+            )
+        except Exception as e:
+            self._log(f"录音失败: {e}")
+            with gen_lock:
+                self._state.failed_resolve_cid = cid
+            self._emit_state("录音失败（可重试）")
+            return
+
+        with gen_lock:
+            if self._state.generation != gen_run:
+                return
+
+        if cfg.dry_run:
+            self._log(f"[dry-run] 已录音 {len(wav)} 字节，跳过上传")
+            with gen_lock:
+                self._state.failed_resolve_cid = None
+                self._state.spoken_cid = None
+                self._state.generation += 1
+            self._emit_state("待机（dry-run）")
+            return
+
+        try:
+            self._emit_state("上传识别…")
+            st, res = api.post_resolve(cfg.surgery_id, cid, wav)
+        except Exception as e:
+            self._log(f"POST resolve 失败: {e}")
+            with gen_lock:
+                self._state.failed_resolve_cid = cid
+            self._emit_state("上传失败（可重试）")
+            return
+
+        if st == 200 and isinstance(res, dict) and res.get("status") == "accepted":
+            self._log(
+                f"已确认: {res.get('message', '')} "
+                f"(resolved_label={res.get('resolved_label')!r})"
+            )
+            with gen_lock:
+                self._state.failed_resolve_cid = None
+                self._state.spoken_cid = None
+                self._state.last_payload = None
+                self._state.generation += 1
+            self._emit_pending(None)
+            self._emit_state("待机")
+            return
+
+        self._log(f"resolve 未接受 HTTP {st}: {res}")
+        with gen_lock:
+            self._state.failed_resolve_cid = cid
+        self._emit_state("解析/上传被拒（可重试）")
--- a/voice_confirmation_client/core/paths.py
+++ b/voice_confirmation_client/core/paths.py
@@ -0,0 +1,47 @@
+"""Resolve bundled helper binaries (ffplay/ffmpeg) next to the package or PyInstaller extract dir."""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+
+def package_root() -> Path:
+    """Directory containing `voice_confirmation_client` package."""
+    return Path(__file__).resolve().parent.parent
+
+
+def frozen_base() -> Path | None:
+    """PyInstaller onefile/onedir: sys._MEIPASS or executable dir."""
+    if getattr(sys, "frozen", False):
+        meipass = getattr(sys, "_MEIPASS", None)
+        if meipass:
+            return Path(meipass)
+        return Path(sys.executable).resolve().parent
+    return None
+
+
+def bin_dir() -> Path:
+    """Optional `bin/` next to package (dev) or under _MEIPASS (frozen)."""
+    fb = frozen_base()
+    if fb is not None:
+        d = fb / "voice_confirmation_bin"
+        if d.is_dir():
+            return d
+    return package_root() / "bin"
+
+
+def find_ffplay() -> Path | None:
+    for name in ("ffplay", "ffplay.exe"):
+        p = bin_dir() / name
+        if p.is_file():
+            return p
+    return None
+
+
+def find_ffmpeg() -> Path | None:
+    for name in ("ffmpeg", "ffmpeg.exe"):
+        p = bin_dir() / name
+        if p.is_file():
+            return p
+    return None
--- a/voice_confirmation_client/core/playback.py
+++ b/voice_confirmation_client/core/playback.py
@@ -0,0 +1,61 @@
+"""Play MP3 bytes via system player or bundled ffplay."""
+
+from __future__ import annotations
+
+import base64
+import os
+import shutil
+import subprocess
+import sys
+import tempfile
+from pathlib import Path
+
+from voice_confirmation_client.core.paths import find_ffplay
+
+
+def play_mp3_from_base64(b64: str) -> None:
+    raw_b64 = "".join((b64 or "").split())
+    if not raw_b64:
+        raise ValueError("empty prompt_audio_mp3_base64")
+    data = base64.b64decode(raw_b64, validate=False)
+    with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as f:
+        f.write(data)
+        tmp = f.name
+    try:
+        _play_mp3_path(Path(tmp))
+    finally:
+        try:
+            os.unlink(tmp)
+        except OSError:
+            pass
+
+
+def _play_mp3_path(path: Path) -> None:
+    bundled = find_ffplay()
+    if bundled and bundled.is_file():
+        subprocess.run(
+            [str(bundled), "-nodisp", "-autoexit", "-loglevel", "quiet", str(path)],
+            check=True,
+            timeout=600,
+        )
+        return
+    ffplay = shutil.which("ffplay")
+    if ffplay:
+        subprocess.run(
+            [ffplay, "-nodisp", "-autoexit", "-loglevel", "quiet", str(path)],
+            check=True,
+            timeout=600,
+        )
+        return
+    if sys.platform == "darwin":
+        subprocess.run(["afplay", str(path)], check=True, timeout=600)
+        return
+    if os.name == "nt":
+        os.startfile(str(path))  # type: ignore[attr-defined]
+        import time
+
+        time.sleep(5)
+        return
+    raise RuntimeError(
+        "No MP3 player found. Install ffmpeg (ffplay) or run on macOS with afplay."
+    )
--- a/voice_confirmation_client/core/record.py
+++ b/voice_confirmation_client/core/record.py
@@ -0,0 +1,94 @@
+"""Record microphone to 16 kHz mono WAV (sounddevice or ffmpeg)."""
+
+from __future__ import annotations
+
+import io
+import subprocess
+import sys
+import tempfile
+import wave
+from pathlib import Path
+
+import numpy as np
+
+from voice_confirmation_client.core.paths import find_ffmpeg
+
+
+def record_wav_16k_mono(
+    duration_sec: float,
+    *,
+    device: int | str | None = None,
+    prefer_ffmpeg: bool = False,
+    ffmpeg_input_args: list[str] | None = None,
+) -> bytes:
+    """Return WAV file bytes (16-bit PCM, 16 kHz, mono)."""
+    if prefer_ffmpeg:
+        bundled = find_ffmpeg()
+        ffmpeg_bin = str(bundled) if bundled and bundled.is_file() else shutil_which_ffmpeg()
+        if ffmpeg_bin:
+            return _record_ffmpeg(ffmpeg_bin, duration_sec, ffmpeg_input_args)
+    return _record_sounddevice(duration_sec, device=device)
+
+
+def shutil_which_ffmpeg() -> str | None:
+    import shutil
+
+    return shutil.which("ffmpeg")
+
+
+def _record_sounddevice(duration_sec: float, device: int | str | None) -> bytes:
+    import sounddevice as sd
+
+    samplerate = 16000
+    frames = int(duration_sec * samplerate)
+    kwargs: dict = {"samplerate": samplerate, "channels": 1, "dtype": "float32"}
+    if device is not None and device != "":
+        kwargs["device"] = device
+    recording = sd.rec(frames, **kwargs)
+    sd.wait()
+    mono = np.clip(recording.reshape(-1), -1.0, 1.0)
+    pcm = (mono * 32767.0).astype(np.int16)
+    buf = io.BytesIO()
+    with wave.open(buf, "wb") as wf:
+        wf.setnchannels(1)
+        wf.setsampwidth(2)
+        wf.setframerate(samplerate)
+        wf.writeframes(pcm.tobytes())
+    return buf.getvalue()
+
+
+def default_ffmpeg_input_args() -> list[str]:
+    if sys.platform == "darwin":
+        return ["-f", "avfoundation", "-i", ":0"]
+    if sys.platform == "win32":
+        return ["-f", "dshow", "-i", "audio=Microphone"]
+    return ["-f", "alsa", "-i", "default"]
+
+
+def _record_ffmpeg(
+    ffmpeg_bin: str, duration_sec: float, ffmpeg_input_args: list[str] | None
+) -> bytes:
+    input_args = ffmpeg_input_args if ffmpeg_input_args else default_ffmpeg_input_args()
+    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
+        out = tmp.name
+    try:
+        cmd = [
+            ffmpeg_bin,
+            "-y",
+            "-loglevel",
+            "error",
+            *input_args,
+            "-t",
+            str(duration_sec),
+            "-ar",
+            "16000",
+            "-ac",
+            "1",
+            "-sample_fmt",
+            "s16",
+            out,
+        ]
+        subprocess.run(cmd, check=True, timeout=int(duration_sec) + 45)
+        return Path(out).read_bytes()
+    finally:
+        Path(out).unlink(missing_ok=True)
--- a/voice_confirmation_client/gui/init.py
+++ b/voice_confirmation_client/gui/init.py
@@ -0,0 +1 @@
+"""PySide6 desktop GUI."""
--- a/voice_confirmation_client/gui/main_window.py
+++ b/voice_confirmation_client/gui/main_window.py
@@ -0,0 +1,198 @@
+"""Main PySide6 window for the voice confirmation client."""
+
+from __future__ import annotations
+
+import json
+from datetime import datetime
+from typing import Any
+
+from PySide6.QtCore import Qt, Signal, QObject
+from PySide6.QtGui import QCloseEvent
+from PySide6.QtWidgets import (
+    QCheckBox,
+    QComboBox,
+    QDoubleSpinBox,
+    QFormLayout,
+    QGroupBox,
+    QHBoxLayout,
+    QLabel,
+    QLineEdit,
+    QMainWindow,
+    QMessageBox,
+    QPushButton,
+    QPlainTextEdit,
+    QSplitter,
+    QVBoxLayout,
+    QWidget,
+)
+
+from voice_confirmation_client.core.monitor_worker import MonitorWorker
+
+
+class _Bridge(QObject):
+    log_line = Signal(str)
+    state_text = Signal(str)
+    pending_payload = Signal(object)
+
+
+class MainWindow(QMainWindow):
+    def __init__(self) -> None:
+        super().__init__()
+        self.setWindowTitle("手术室耗材语音确认客户端")
+        self.resize(920, 640)
+
+        self._bridge = _Bridge()
+        self._bridge.log_line.connect(self._append_log)
+        self._bridge.pending_payload.connect(self._show_pending)
+
+        self._worker = MonitorWorker(
+            on_log=lambda m: self._bridge.log_line.emit(m),
+            on_state=lambda s: self._bridge.state_text.emit(s),
+            on_pending=lambda p: self._bridge.pending_payload.emit(p),
+        )
+        self._worker.start_thread()
+
+        central = QWidget()
+        self.setCentralWidget(central)
+        root = QVBoxLayout(central)
+
+        form_box = QGroupBox("连接与手术")
+        form = QFormLayout(form_box)
+        self._base_url = QLineEdit("http://127.0.0.1:38080")
+        self._surgery_id = QLineEdit("")
+        self._surgery_id.setPlaceholderText("6 位数字，如 123456")
+        self._interval = QDoubleSpinBox()
+        self._interval.setRange(1.0, 120.0)
+        self._interval.setValue(5.0)
+        self._interval.setSuffix(" s")
+        self._record_sec = QDoubleSpinBox()
+        self._record_sec.setRange(2.0, 60.0)
+        self._record_sec.setValue(8.0)
+        self._record_sec.setSuffix(" s")
+        form.addRow("服务端 Base URL", self._base_url)
+        form.addRow("手术号 surgery_id", self._surgery_id)
+        form.addRow("轮询间隔", self._interval)
+        form.addRow("录音时长", self._record_sec)
+        root.addWidget(form_box)
+
+        adv = QGroupBox("音频 / 调试")
+        adv_l = QFormLayout(adv)
+        self._device_combo = QComboBox()
+        self._device_combo.addItem("系统默认麦克风", None)
+        self._populate_input_devices()
+        self._prefer_ffmpeg = QCheckBox("优先使用 ffmpeg 录音（需本机 ffmpeg 且设备参数可用）")
+        self._hide_404 = QCheckBox("隐藏 404 轮询日志（推荐）")
+        self._hide_404.setChecked(True)
+        self._dry_run = QCheckBox("Dry-run：录音后不上传")
+        adv_l.addRow("输入设备", self._device_combo)
+        adv_l.addRow(self._prefer_ffmpeg)
+        adv_l.addRow(self._hide_404)
+        adv_l.addRow(self._dry_run)
+        root.addWidget(adv)
+
+        btn_row = QHBoxLayout()
+        self._btn_start = QPushButton("开始监控")
+        self._btn_stop = QPushButton("停止监控")
+        self._btn_stop.setEnabled(False)
+        self._btn_retry = QPushButton("重试本轮（播放+录音+上传）")
+        self._btn_replay = QPushButton("仅重播话术")
+        btn_row.addWidget(self._btn_start)
+        btn_row.addWidget(self._btn_stop)
+        btn_row.addWidget(self._btn_retry)
+        btn_row.addWidget(self._btn_replay)
+        btn_row.addStretch()
+        root.addLayout(btn_row)
+
+        self._status_label = QLabel("待机")
+        root.addWidget(self._status_label)
+        self._bridge.state_text.connect(self._status_label.setText)
+
+        split = QSplitter(Qt.Orientation.Horizontal)
+        self._pending_view = QPlainTextEdit()
+        self._pending_view.setReadOnly(True)
+        self._pending_view.setPlaceholderText("待确认内容将显示在这里…")
+        self._log = QPlainTextEdit()
+        self._log.setReadOnly(True)
+        self._log.setPlaceholderText("日志…")
+        split.addWidget(self._pending_view)
+        split.addWidget(self._log)
+        split.setSizes([360, 520])
+        root.addWidget(split, stretch=1)
+
+        self._btn_start.clicked.connect(self._start_monitoring)
+        self._btn_stop.clicked.connect(self._stop_monitoring)
+        self._btn_retry.clicked.connect(self._worker.retry_failed)
+        self._btn_replay.clicked.connect(self._worker.replay_prompt_only)
+
+        self._apply_settings_silent()
+
+    def _show_pending(self, payload: object) -> None:
+        if payload is None:
+            self._pending_view.clear()
+            return
+        if not isinstance(payload, dict):
+            self._pending_view.setPlainText(str(payload))
+            return
+        try:
+            text = json.dumps(payload, ensure_ascii=False, indent=2)
+        except (TypeError, ValueError):
+            text = str(payload)
+        self._pending_view.setPlainText(text)
+
+    def _populate_input_devices(self) -> None:
+        try:
+            import sounddevice as sd
+        except ImportError:
+            return
+        try:
+            devices = sd.query_devices()
+            hostapis = sd.query_hostapis()
+        except Exception:
+            return
+        for i, d in enumerate(devices):
+            if d.get("max_input_channels", 0) <= 0:
+                continue
+            ha = hostapis[d["hostapi"]]["name"] if d.get("hostapi") is not None else ""
+            label = f"{i}: {d.get('name', '')} ({ha})"
+            self._device_combo.addItem(label, i)
+
+    def _apply_settings_silent(self) -> None:
+        dev_data = self._device_combo.currentData()
+        self._worker.set_settings(
+            base_url=self._base_url.text().strip(),
+            surgery_id=self._surgery_id.text().strip(),
+            interval_sec=float(self._interval.value()),
+            record_seconds=float(self._record_sec.value()),
+            dry_run=self._dry_run.isChecked(),
+            hide_404_logs=self._hide_404.isChecked(),
+            prefer_ffmpeg_record=self._prefer_ffmpeg.isChecked(),
+            sounddevice_device=dev_data,
+        )
+
+    def _start_monitoring(self) -> None:
+        sid = self._surgery_id.text().strip()
+        if len(sid) != 6 or not sid.isdigit():
+            QMessageBox.warning(self, "校验失败", "手术号必须为 6 位数字。")
+            return
+        self._apply_settings_silent()
+        self._worker.set_monitoring(True)
+        self._btn_start.setEnabled(False)
+        self._btn_stop.setEnabled(True)
+        self._append_log("—— 开始监控 ——")
+
+    def _stop_monitoring(self) -> None:
+        self._worker.set_monitoring(False)
+        self._btn_start.setEnabled(True)
+        self._btn_stop.setEnabled(False)
+        self._append_log("—— 已停止监控 ——")
+        self._status_label.setText("已停止")
+
+    def _append_log(self, line: str) -> None:
+        ts = datetime.now().strftime("%H:%M:%S")
+        self._log.appendPlainText(f"[{ts}] {line}")
+        sb = self._log.verticalScrollBar()
+        sb.setValue(sb.maximum())
+
+    def closeEvent(self, event: QCloseEvent) -> None:
+        self._worker.stop_thread()
+        event.accept()