# Training with Timestamp Labels (Segment Supervision) If “scared” only happens briefly inside a long video, training with a single label per whole video often fails: most random clips look **normal**, so the model learns **normal**. This repo now supports **segment/timestamp supervision** via `train_pytorchvideo_x3d_segments.py`. ## 1) Create `train_segments.csv` Format (whitespace-separated; last token is label int): ``` relative/video.mp4 start_sec end_sec label ``` Examples: ``` scared_underwater/foo.mp4 12.4 16.2 3 scared_underwater/foo.mp4 30.0 32.0 3 normal_underwater/bar.mp4 0.0 60.0 1 feeding/baz.mp4 5.0 8.0 0 ``` Notes: - `start_sec` and `end_sec` are in **seconds** from the beginning of the video. - You can have **multiple segments per video** (recommended). - For “normal” videos, you can put one segment covering the whole video. ## 2) Train ```bash cd /home/ubuntu/projects/FishAction python train_pytorchvideo_x3d_segments.py \ --segments_csv /home/ubuntu/data/fish/fish_action_videos/train_segments.csv \ --val_csv /home/ubuntu/data/fish/fish_action_videos/val.csv \ --path_prefix /home/ubuntu/data/fish/fish_action_videos \ --model x3d_m \ --pretrained \ --num_frames 16 \ --sampling_rate 5 \ --target_fps 30 \ --batch_size 4 \ --epochs 30 \ --num_workers 4 \ --amp \ --output_dir /home/ubuntu/projects/FishAction/checkpoints/ptv_x3d_m_segments ``` ## 3) Why this helps - The training clips are sampled **inside the labeled windows**, so “scared” clips are actually “scared”. - This is the most direct way to fix “scared videos predicted as normal”. ## 4) Next improvements (optional) - Add `val_segments.csv` for segment-level validation (more accurate than video-level val). - Oversample scarce classes (e.g., duplicate “scared” segments) to balance training.