61 lines
1.8 KiB
Markdown
61 lines
1.8 KiB
Markdown
|
|
# Training with Timestamp Labels (Segment Supervision)
|
||
|
|
|
||
|
|
If “scared” only happens briefly inside a long video, training with a single label per whole video often fails:
|
||
|
|
most random clips look **normal**, so the model learns **normal**.
|
||
|
|
|
||
|
|
This repo now supports **segment/timestamp supervision** via `train_pytorchvideo_x3d_segments.py`.
|
||
|
|
|
||
|
|
## 1) Create `train_segments.csv`
|
||
|
|
|
||
|
|
Format (whitespace-separated; last token is label int):
|
||
|
|
|
||
|
|
```
|
||
|
|
relative/video.mp4 start_sec end_sec label
|
||
|
|
```
|
||
|
|
|
||
|
|
Examples:
|
||
|
|
```
|
||
|
|
scared_underwater/foo.mp4 12.4 16.2 3
|
||
|
|
scared_underwater/foo.mp4 30.0 32.0 3
|
||
|
|
normal_underwater/bar.mp4 0.0 60.0 1
|
||
|
|
feeding/baz.mp4 5.0 8.0 0
|
||
|
|
```
|
||
|
|
|
||
|
|
Notes:
|
||
|
|
- `start_sec` and `end_sec` are in **seconds** from the beginning of the video.
|
||
|
|
- You can have **multiple segments per video** (recommended).
|
||
|
|
- For “normal” videos, you can put one segment covering the whole video.
|
||
|
|
|
||
|
|
## 2) Train
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd /home/ubuntu/projects/FishAction
|
||
|
|
|
||
|
|
python train_pytorchvideo_x3d_segments.py \
|
||
|
|
--segments_csv /home/ubuntu/data/fish/fish_action_videos/train_segments.csv \
|
||
|
|
--val_csv /home/ubuntu/data/fish/fish_action_videos/val.csv \
|
||
|
|
--path_prefix /home/ubuntu/data/fish/fish_action_videos \
|
||
|
|
--model x3d_m \
|
||
|
|
--pretrained \
|
||
|
|
--num_frames 16 \
|
||
|
|
--sampling_rate 5 \
|
||
|
|
--target_fps 30 \
|
||
|
|
--batch_size 4 \
|
||
|
|
--epochs 30 \
|
||
|
|
--num_workers 4 \
|
||
|
|
--amp \
|
||
|
|
--output_dir /home/ubuntu/projects/FishAction/checkpoints/ptv_x3d_m_segments
|
||
|
|
```
|
||
|
|
|
||
|
|
## 3) Why this helps
|
||
|
|
|
||
|
|
- The training clips are sampled **inside the labeled windows**, so “scared” clips are actually “scared”.
|
||
|
|
- This is the most direct way to fix “scared videos predicted as normal”.
|
||
|
|
|
||
|
|
## 4) Next improvements (optional)
|
||
|
|
|
||
|
|
- Add `val_segments.csv` for segment-level validation (more accurate than video-level val).
|
||
|
|
- Oversample scarce classes (e.g., duplicate “scared” segments) to balance training.
|
||
|
|
|
||
|
|
|