Files

zaiun xu 9df21f80ef Initial commit: FishServer monorepo (FishAction, FishMeasure, fish_api)

Made-with: Cursor

2026-04-08 19:32:23 +08:00

1.8 KiB

Executable File

Raw Blame History

Training with Timestamp Labels (Segment Supervision)

If “scared” only happens briefly inside a long video, training with a single label per whole video often fails: most random clips look normal, so the model learns normal.

This repo now supports segment/timestamp supervision via train_pytorchvideo_x3d_segments.py.

1) Create `train_segments.csv`

Format (whitespace-separated; last token is label int):

relative/video.mp4  start_sec  end_sec  label

Examples:

scared_underwater/foo.mp4  12.4  16.2  3
scared_underwater/foo.mp4  30.0  32.0  3
normal_underwater/bar.mp4  0.0   60.0  1
feeding/baz.mp4            5.0   8.0   0

Notes:

start_sec and end_sec are in seconds from the beginning of the video.
You can have multiple segments per video (recommended).
For “normal” videos, you can put one segment covering the whole video.

2) Train

cd /home/ubuntu/projects/FishAction

python train_pytorchvideo_x3d_segments.py \
  --segments_csv /home/ubuntu/data/fish/fish_action_videos/train_segments.csv \
  --val_csv /home/ubuntu/data/fish/fish_action_videos/val.csv \
  --path_prefix /home/ubuntu/data/fish/fish_action_videos \
  --model x3d_m \
  --pretrained \
  --num_frames 16 \
  --sampling_rate 5 \
  --target_fps 30 \
  --batch_size 4 \
  --epochs 30 \
  --num_workers 4 \
  --amp \
  --output_dir /home/ubuntu/projects/FishAction/checkpoints/ptv_x3d_m_segments

3) Why this helps

The training clips are sampled inside the labeled windows, so “scared” clips are actually “scared”.
This is the most direct way to fix “scared videos predicted as normal”.

4) Next improvements (optional)

Add val_segments.csv for segment-level validation (more accurate than video-level val).
Oversample scarce classes (e.g., duplicate “scared” segments) to balance training.

1.8 KiB Executable File Raw Blame History