Initial commit: FishServer monorepo (FishAction, FishMeasure, fish_api)

Made-with: Cursor
2026-04-08 19:32:23 +08:00
commit 9df21f80ef
180 changed files with 96298 additions and 0 deletions
--- a/FishAction/SEGMENT_LABEL_TRAINING.md
+++ b/FishAction/SEGMENT_LABEL_TRAINING.md
@@ -0,0 +1,60 @@
+# Training with Timestamp Labels (Segment Supervision)
+
+If “scared” only happens briefly inside a long video, training with a single label per whole video often fails:
+most random clips look **normal**, so the model learns **normal**.
+
+This repo now supports **segment/timestamp supervision** via `train_pytorchvideo_x3d_segments.py`.
+
+## 1) Create `train_segments.csv`
+
+Format (whitespace-separated; last token is label int):
+
+```
+relative/video.mp4  start_sec  end_sec  label
+```
+
+Examples:
+```
+scared_underwater/foo.mp4  12.4  16.2  3
+scared_underwater/foo.mp4  30.0  32.0  3
+normal_underwater/bar.mp4  0.0   60.0  1
+feeding/baz.mp4            5.0   8.0   0
+```
+
+Notes:
+- `start_sec` and `end_sec` are in **seconds** from the beginning of the video.
+- You can have **multiple segments per video** (recommended).
+- For “normal” videos, you can put one segment covering the whole video.
+
+## 2) Train
+
+```bash
+cd /home/ubuntu/projects/FishAction
+
+python train_pytorchvideo_x3d_segments.py \
+  --segments_csv /home/ubuntu/data/fish/fish_action_videos/train_segments.csv \
+  --val_csv /home/ubuntu/data/fish/fish_action_videos/val.csv \
+  --path_prefix /home/ubuntu/data/fish/fish_action_videos \
+  --model x3d_m \
+  --pretrained \
+  --num_frames 16 \
+  --sampling_rate 5 \
+  --target_fps 30 \
+  --batch_size 4 \
+  --epochs 30 \
+  --num_workers 4 \
+  --amp \
+  --output_dir /home/ubuntu/projects/FishAction/checkpoints/ptv_x3d_m_segments
+```
+
+## 3) Why this helps
+
+- The training clips are sampled **inside the labeled windows**, so “scared” clips are actually “scared”.
+- This is the most direct way to fix “scared videos predicted as normal”.
+
+## 4) Next improvements (optional)
+
+- Add `val_segments.csv` for segment-level validation (more accurate than video-level val).
+- Oversample scarce classes (e.g., duplicate “scared” segments) to balance training.
+
+