Files
FishServer/FishAction/SEGMENT_LABEL_TRAINING.md
2026-04-08 19:32:23 +08:00

1.8 KiB
Executable File

Training with Timestamp Labels (Segment Supervision)

If “scared” only happens briefly inside a long video, training with a single label per whole video often fails: most random clips look normal, so the model learns normal.

This repo now supports segment/timestamp supervision via train_pytorchvideo_x3d_segments.py.

1) Create train_segments.csv

Format (whitespace-separated; last token is label int):

relative/video.mp4  start_sec  end_sec  label

Examples:

scared_underwater/foo.mp4  12.4  16.2  3
scared_underwater/foo.mp4  30.0  32.0  3
normal_underwater/bar.mp4  0.0   60.0  1
feeding/baz.mp4            5.0   8.0   0

Notes:

  • start_sec and end_sec are in seconds from the beginning of the video.
  • You can have multiple segments per video (recommended).
  • For “normal” videos, you can put one segment covering the whole video.

2) Train

cd /home/ubuntu/projects/FishAction

python train_pytorchvideo_x3d_segments.py \
  --segments_csv /home/ubuntu/data/fish/fish_action_videos/train_segments.csv \
  --val_csv /home/ubuntu/data/fish/fish_action_videos/val.csv \
  --path_prefix /home/ubuntu/data/fish/fish_action_videos \
  --model x3d_m \
  --pretrained \
  --num_frames 16 \
  --sampling_rate 5 \
  --target_fps 30 \
  --batch_size 4 \
  --epochs 30 \
  --num_workers 4 \
  --amp \
  --output_dir /home/ubuntu/projects/FishAction/checkpoints/ptv_x3d_m_segments

3) Why this helps

  • The training clips are sampled inside the labeled windows, so “scared” clips are actually “scared”.
  • This is the most direct way to fix “scared videos predicted as normal”.

4) Next improvements (optional)

  • Add val_segments.csv for segment-level validation (more accurate than video-level val).
  • Oversample scarce classes (e.g., duplicate “scared” segments) to balance training.