FishAction/SEGMENT_LABEL_TRAINING.md

# Training with Timestamp Labels (Segment Supervision)

If “scared” only happens briefly inside a long video, training with a single label per whole video often fails:
most random clips look **normal**, so the model learns **normal**.

This repo now supports **segment/timestamp supervision** via `train_pytorchvideo_x3d_segments.py`.

## 1) Create `train_segments.csv`

Format (whitespace-separated; last token is label int):

```
relative/video.mp4  start_sec  end_sec  label
```

Examples:
```
scared_underwater/foo.mp4  12.4  16.2  3
scared_underwater/foo.mp4  30.0  32.0  3
normal_underwater/bar.mp4  0.0   60.0  1
feeding/baz.mp4            5.0   8.0   0
```

Notes:
- `start_sec` and `end_sec` are in **seconds** from the beginning of the video.
- You can have **multiple segments per video** (recommended).
- For “normal” videos, you can put one segment covering the whole video.

## 2) Train

```bash
cd /home/ubuntu/projects/FishAction

python train_pytorchvideo_x3d_segments.py \
  --segments_csv /home/ubuntu/data/fish/fish_action_videos/train_segments.csv \
  --val_csv /home/ubuntu/data/fish/fish_action_videos/val.csv \
  --path_prefix /home/ubuntu/data/fish/fish_action_videos \
  --model x3d_m \
  --pretrained \
  --num_frames 16 \
  --sampling_rate 5 \
  --target_fps 30 \
  --batch_size 4 \
  --epochs 30 \
  --num_workers 4 \
  --amp \
  --output_dir /home/ubuntu/projects/FishAction/checkpoints/ptv_x3d_m_segments
```

## 3) Why this helps

- The training clips are sampled **inside the labeled windows**, so “scared” clips are actually “scared”.
- This is the most direct way to fix “scared videos predicted as normal”.

## 4) Next improvements (optional)

- Add `val_segments.csv` for segment-level validation (more accurate than video-level val).
- Oversample scarce classes (e.g., duplicate “scared” segments) to balance training.
Initial commit: FishServer monorepo (FishAction, FishMeasure, fish_api) Made-with: Cursor 2026-04-08 19:32:23 +08:00			`# Training with Timestamp Labels (Segment Supervision)`

			`If “scared” only happens briefly inside a long video, training with a single label per whole video often fails:`
			`most random clips look normal, so the model learns normal.`

			This repo now supports segment/timestamp supervision via `train_pytorchvideo_x3d_segments.py`.

			## 1) Create `train_segments.csv`

			`Format (whitespace-separated; last token is label int):`

			```
			`relative/video.mp4 start_sec end_sec label`
			```

			`Examples:`
			```
			`scared_underwater/foo.mp4 12.4 16.2 3`
			`scared_underwater/foo.mp4 30.0 32.0 3`
			`normal_underwater/bar.mp4 0.0 60.0 1`
			`feeding/baz.mp4 5.0 8.0 0`
			```

			`Notes:`
			- `start_sec` and `end_sec` are in seconds from the beginning of the video.
			`- You can have multiple segments per video (recommended).`
			`- For “normal” videos, you can put one segment covering the whole video.`

			`## 2) Train`

			```bash
			`cd /home/ubuntu/projects/FishAction`

			`python train_pytorchvideo_x3d_segments.py \`
			`--segments_csv /home/ubuntu/data/fish/fish_action_videos/train_segments.csv \`
			`--val_csv /home/ubuntu/data/fish/fish_action_videos/val.csv \`
			`--path_prefix /home/ubuntu/data/fish/fish_action_videos \`
			`--model x3d_m \`
			`--pretrained \`
			`--num_frames 16 \`
			`--sampling_rate 5 \`
			`--target_fps 30 \`
			`--batch_size 4 \`
			`--epochs 30 \`
			`--num_workers 4 \`
			`--amp \`
			`--output_dir /home/ubuntu/projects/FishAction/checkpoints/ptv_x3d_m_segments`
			```

			`## 3) Why this helps`

			`- The training clips are sampled inside the labeled windows, so “scared” clips are actually “scared”.`
			`- This is the most direct way to fix “scared videos predicted as normal”.`

			`## 4) Next improvements (optional)`

			- Add `val_segments.csv` for segment-level validation (more accurate than video-level val).
			`- Oversample scarce classes (e.g., duplicate “scared” segments) to balance training.`