Run compose api as HOST_UID/GID with cache under /tmp, poll slice files for ready_event when ffmpeg stderr is silent, invoke batch via venv python, exclude logs from build context, and document Docker cache/VLC troubleshooting. Co-authored-by: Cursor <cursoragent@cursor.com>
123 lines
4.0 KiB
Markdown
123 lines
4.0 KiB
Markdown
# Docker Compose 部署(NVIDIA GPU)
|
||
|
||
本文说明在 **NVIDIA GPU 服务器**上通过 Docker Compose 部署全套后端(FastAPI + PostgreSQL + MinIO),以及 Demo 客户端、语音确认页的**手动**启动方式。
|
||
|
||
## 仓库结构
|
||
|
||
```
|
||
operation-room-monitor/
|
||
backend/ # API + DB + MinIO(docker compose)
|
||
clients/ # 独立前端(手动启动)
|
||
docs/ # 文档
|
||
```
|
||
|
||
## 架构
|
||
|
||
| 组件 | 部署方式 | 默认端口 |
|
||
|------|----------|----------|
|
||
| API + PostgreSQL + MinIO | `cd backend && docker compose up -d --build` | 38080 / 45432 / 19000 |
|
||
| Demo 客户端 | `clients/demo-client/start.sh` | 38081 |
|
||
| 语音确认页 | `clients/voice-confirmation/start.sh` | 8080 |
|
||
|
||
---
|
||
|
||
## 一、前置条件
|
||
|
||
- Docker Compose V2、NVIDIA 驱动、NVIDIA Container Toolkit
|
||
- 复制 `backend/.env.example` 为 `backend/.env` 并填写
|
||
- 算法子进程包:`backend/algorithm_subprocesses/5.15/`(含 `main.py` 与 `weights/`;镜像构建时会 `COPY` 进容器,勿在 `.dockerignore` 中整目录排除)
|
||
- 标注视频中文字体:镜像内已安装 `fonts-noto-cjk`、`fonts-wqy-microhei`(供 `visualize_result_video.py` 绘制耗材标签)
|
||
- 医生识别(MediaPipe Pose):镜像内已安装 `libgles2`、`libegl1`、`libegl-mesa0`、`libglx-mesa0`、`libgl1-mesa-dri` 等 Mesa/GLVND 库;构建阶段会 `import mediapipe` 校验 `libGLESv2.so.2` 可用。子进程强制 CPU delegate。若仍见该错误,请 **`docker compose build --no-cache api`** 后重启(勿沿用旧 tarball 镜像)
|
||
- 可选备用权重:`backend/app/resources/actionformer_epoch_045.pth.tar`
|
||
|
||
---
|
||
|
||
## 二、启动后端
|
||
|
||
```bash
|
||
cd backend
|
||
docker compose up -d --build
|
||
```
|
||
|
||
健康检查:
|
||
|
||
```bash
|
||
curl -sf http://127.0.0.1:38080/health
|
||
```
|
||
|
||
GPU 验证:
|
||
|
||
```bash
|
||
docker compose exec api python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
|
||
```
|
||
|
||
停止 / 重置:
|
||
|
||
```bash
|
||
docker compose down
|
||
docker compose down -v # 删除 PostgreSQL / MinIO 卷
|
||
```
|
||
|
||
### 构建 API 镜像失败:`invalid tar header` / `unpigz: corrupted`
|
||
|
||
`uv sync` 已成功,但在 **exporting / unpacking** 阶段报错时,通常是 **Docker 本地层缓存或存储损坏**,与 Dockerfile 无关。
|
||
|
||
按顺序处理:
|
||
|
||
```bash
|
||
cd backend
|
||
chmod +x scripts/rebuild-api-image.sh
|
||
|
||
# 清缓存并重建(推荐)
|
||
./scripts/rebuild-api-image.sh
|
||
|
||
# 仍失败时:重启 Docker 后再跑
|
||
RESTART_DOCKER=1 ./scripts/rebuild-api-image.sh
|
||
|
||
# 再失败:改用旧版构建器(无 BuildKit)
|
||
COMPOSE_DOCKER_CLI_BUILD=0 DOCKER_BUILDKIT=0 docker compose build api --no-cache
|
||
docker compose up -d --force-recreate api
|
||
```
|
||
|
||
手动等价步骤:`docker builder prune -af` → `docker rmi -f backend-api:latest` → `docker compose build api --no-cache`。
|
||
|
||
确认根分区剩余空间充足(建议 ≥ 20GB);空间不足时大层导出也容易损坏。
|
||
|
||
### RTSP 切片在宿主机无法用 VLC 打开
|
||
|
||
默认情况下 API 容器以 **root** 写入 `./logs`,切片属主为 `root:root`。普通用户虽可用 `cat` 读取,但 **Snap 版 VLC** 等沙箱应用常会报 Permission denied。
|
||
|
||
在 `backend/.env` 中设置与宿主机一致的 UID/GID(见 `.env.example` 的 `HOST_UID` / `HOST_GID` / `DOCKER_GID`),然后重建 API 容器:
|
||
|
||
```bash
|
||
cd backend
|
||
docker compose up -d --force-recreate api
|
||
```
|
||
|
||
**已有** root 属主的切片需一次性修正(可选):
|
||
|
||
```bash
|
||
sudo chown -R "$(id -u):$(id -g)" backend/logs/rtsp_segments
|
||
```
|
||
|
||
---
|
||
|
||
## 三、手动启动客户端
|
||
|
||
```bash
|
||
cd clients/demo-client && ./start.sh
|
||
cd clients/voice-confirmation && ./start.sh
|
||
```
|
||
|
||
浏览器 Base URL 填 `http://<GPU服务器IP>:38080`。
|
||
|
||
---
|
||
|
||
## 四、相关文档
|
||
|
||
- [部署版使用指南.md](部署版使用指南.md)
|
||
- [客户端手术通信接口说明.md](客户端手术通信接口说明.md)
|
||
- [clients/demo-client/README.md](../clients/demo-client/README.md)
|
||
- [clients/voice-confirmation/README.md](../clients/voice-confirmation/README.md)
|
||
- [离线镜像tarball部署.md](离线镜像tarball部署.md)
|