1.8 KiB
Executable File
1.8 KiB
Executable File
Training with Timestamp Labels (Segment Supervision)
If “scared” only happens briefly inside a long video, training with a single label per whole video often fails: most random clips look normal, so the model learns normal.
This repo now supports segment/timestamp supervision via train_pytorchvideo_x3d_segments.py.
1) Create train_segments.csv
Format (whitespace-separated; last token is label int):
relative/video.mp4 start_sec end_sec label
Examples:
scared_underwater/foo.mp4 12.4 16.2 3
scared_underwater/foo.mp4 30.0 32.0 3
normal_underwater/bar.mp4 0.0 60.0 1
feeding/baz.mp4 5.0 8.0 0
Notes:
start_secandend_secare in seconds from the beginning of the video.- You can have multiple segments per video (recommended).
- For “normal” videos, you can put one segment covering the whole video.
2) Train
cd /home/ubuntu/projects/FishAction
python train_pytorchvideo_x3d_segments.py \
--segments_csv /home/ubuntu/data/fish/fish_action_videos/train_segments.csv \
--val_csv /home/ubuntu/data/fish/fish_action_videos/val.csv \
--path_prefix /home/ubuntu/data/fish/fish_action_videos \
--model x3d_m \
--pretrained \
--num_frames 16 \
--sampling_rate 5 \
--target_fps 30 \
--batch_size 4 \
--epochs 30 \
--num_workers 4 \
--amp \
--output_dir /home/ubuntu/projects/FishAction/checkpoints/ptv_x3d_m_segments
3) Why this helps
- The training clips are sampled inside the labeled windows, so “scared” clips are actually “scared”.
- This is the most direct way to fix “scared videos predicted as normal”.
4) Next improvements (optional)
- Add
val_segments.csvfor segment-level validation (more accurate than video-level val). - Oversample scarce classes (e.g., duplicate “scared” segments) to balance training.