FishAction/TRAINING_GUIDE.md

# Training Guide: Fine-tuning SlowFast for Fish Action Classification

This guide will help you fine-tune a pretrained SlowFast model on your fish action dataset.

## Prerequisites

1. ✅ You have generated CSV files using `prepare_fish_dataset.py`
2. ✅ Your videos are organized in `~/data/fish/fish_action_videos/`
3. ✅ CSV files are in `./data/fish/fish_action_training_dataset/`

## Step 1: Download Pretrained Model

Download a pretrained SlowFast model from the [Model Zoo](https://github.com/facebookresearch/SlowFast/blob/main/MODEL_ZOO.md).

### Recommended Models:

**SlowFast 8x8 R50 (Kinetics 400)** - Good balance of accuracy and speed:
```bash
# Create checkpoints directory
mkdir -p checkpoints

# Download pretrained model (Caffe2 format)
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
    -O checkpoints/SLOWFAST_8x8_R50.pkl
```

Or download PyTorch format if available:
```bash
# PyTorch format (if available)
wget <pytorch_model_url> -O checkpoints/SLOWFAST_8x8_R50.pyth
```

## Step 2: Configure Your Training

Edit the config file `configs/fish_action_SLOWFAST_8x8_R50.yaml`:

1. **Set pretrained model path:**
   ```yaml
   TRAIN:
     CHECKPOINT_FILE_PATH: "checkpoints/SLOWFAST_8x8_R50.pkl"
     CHECKPOINT_TYPE: caffe2  # or "pytorch" if using .pyth file
   ```

2. **Verify dataset paths:**
   ```yaml
   DATA:
     PATH_TO_DATA_DIR: "/home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset"
     PATH_PREFIX: "/home/ubuntu/data/fish/fish_action_videos"
   ```

3. **Adjust batch size** based on your GPU memory:
   ```yaml
   TRAIN:
     BATCH_SIZE: 8  # Reduce if you get OOM errors
   ```

4. **Number of classes** is already set to 5:
   ```yaml
   MODEL:
     NUM_CLASSES: 5
   ```

## Step 3: Start Training

### Basic Training Command

```bash
cd /home/ubuntu/projects/FishAction

python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    NUM_GPUS 1
```

### Training with Command-Line Overrides

You can override config values from the command line:

```bash
python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    TRAIN.BATCH_SIZE 4 \
    NUM_GPUS 1 \
    SOLVER.MAX_EPOCH 30 \
    SOLVER.BASE_LR 0.005
```

### Multi-GPU Training

If you have multiple GPUs:

```bash
python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    NUM_GPUS 2 \
    TRAIN.BATCH_SIZE 16
```

## Step 4: Monitor Training

Training logs will be saved to:
- Console output
- `checkpoints/fish_action/logs/` (if TensorBoard is enabled)

### Check Training Progress

```bash
# View latest checkpoint
ls -lh checkpoints/fish_action/

# View logs
tail -f checkpoints/fish_action/logs/*.log
```

## Step 5: Evaluate Model

After training, evaluate on validation set:

```bash
python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    TRAIN.ENABLE False \
    TEST.ENABLE True \
    TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
    NUM_GPUS 1
```

## Step 6: Test on Test Set

```bash
python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    TRAIN.ENABLE False \
    TEST.ENABLE True \
    TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
    DATA.PATH_TO_DATA_DIR /home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset \
    NUM_GPUS 1
```

## Troubleshooting

### Out of Memory (OOM) Errors

Reduce batch size:
```yaml
TRAIN:
  BATCH_SIZE: 4  # or even 2
```

Or reduce number of frames:
```yaml
DATA:
  NUM_FRAMES: 16  # instead of 32
```

### Model Not Loading

1. **Check checkpoint path:**
   ```bash
   ls -lh checkpoints/SLOWFAST_8x8_R50.pkl
   ```

2. **Verify checkpoint type:**
   - `.pkl` files are usually Caffe2 format
   - `.pyth` files are PyTorch format
   - Set `CHECKPOINT_TYPE` accordingly

3. **If using Caffe2 checkpoint:**
   ```yaml
   TRAIN:
     CHECKPOINT_TYPE: caffe2
   ```

### Dataset Not Found

1. **Verify CSV files exist:**
   ```bash
   ls -lh data/fish/fish_action_training_dataset/*.csv
   ```

2. **Check video paths in CSV:**
   ```bash
   head data/fish/fish_action_training_dataset/train.csv
   ```

3. **Verify PATH_PREFIX:**
   - Should be absolute path to video directory
   - Videos should be accessible at: `PATH_PREFIX + path_in_csv`

### Slow Training

1. **Increase NUM_WORKERS** (if you have CPU cores):
   ```yaml
   DATA_LOADER:
     NUM_WORKERS: 8
   ```

2. **Use mixed precision training** (if supported):
   ```yaml
   TRAIN:
     MIXED_PRECISION: True
   ```

## Tips for Better Results

1. **Learning Rate:** Start with 0.01 for fine-tuning, can go lower (0.001) if overfitting
2. **Epochs:** Monitor validation accuracy - stop if it plateaus
3. **Data Augmentation:** Already enabled in config (random crop, flip, etc.)
4. **Early Stopping:** Manually stop if validation accuracy doesn't improve

## Next Steps

After training:
1. Evaluate on test set
2. Use the best checkpoint for inference
3. Create inference script for new videos

## Example: Complete Training Session

```bash
# 1. Download pretrained model
mkdir -p checkpoints
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
    -O checkpoints/SLOWFAST_8x8_R50.pkl

# 2. Edit config file (set checkpoint path and verify dataset paths)

# 3. Start training
python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    NUM_GPUS 1

# 4. After training, evaluate
python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    TRAIN.ENABLE False \
    TEST.ENABLE True \
    TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
    NUM_GPUS 1
```

## Your Classes

Based on your dataset preparation, your 5 classes are:
- 0: feeding
- 1: normal_underwater
- 2: normal_upperwater
- 3: scared_underwater
- 4: scared_upperwater

Check `data/fish/fish_action_training_dataset/label_map.txt` for the exact mapping.
Initial commit: FishServer monorepo (FishAction, FishMeasure, fish_api) Made-with: Cursor 2026-04-08 19:32:23 +08:00			`# Training Guide: Fine-tuning SlowFast for Fish Action Classification`

			`This guide will help you fine-tune a pretrained SlowFast model on your fish action dataset.`

			`## Prerequisites`

			1. ✅ You have generated CSV files using `prepare_fish_dataset.py`
			2. ✅ Your videos are organized in `~/data/fish/fish_action_videos/`
			3. ✅ CSV files are in `./data/fish/fish_action_training_dataset/`

			`## Step 1: Download Pretrained Model`

			`Download a pretrained SlowFast model from the [Model Zoo](https://github.com/facebookresearch/SlowFast/blob/main/MODEL_ZOO.md).`

			`### Recommended Models:`

			`SlowFast 8x8 R50 (Kinetics 400) - Good balance of accuracy and speed:`
			```bash
			`# Create checkpoints directory`
			`mkdir -p checkpoints`

			`# Download pretrained model (Caffe2 format)`
			`wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \`
			`-O checkpoints/SLOWFAST_8x8_R50.pkl`
			```

			`Or download PyTorch format if available:`
			```bash
			`# PyTorch format (if available)`
			`wget <pytorch_model_url> -O checkpoints/SLOWFAST_8x8_R50.pyth`
			```

			`## Step 2: Configure Your Training`

			Edit the config file `configs/fish_action_SLOWFAST_8x8_R50.yaml`:

			`1. Set pretrained model path:`
			```yaml
			`TRAIN:`
			`CHECKPOINT_FILE_PATH: "checkpoints/SLOWFAST_8x8_R50.pkl"`
			`CHECKPOINT_TYPE: caffe2 # or "pytorch" if using .pyth file`
			```

			`2. Verify dataset paths:`
			```yaml
			`DATA:`
			`PATH_TO_DATA_DIR: "/home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset"`
			`PATH_PREFIX: "/home/ubuntu/data/fish/fish_action_videos"`
			```

			`3. Adjust batch size based on your GPU memory:`
			```yaml
			`TRAIN:`
			`BATCH_SIZE: 8 # Reduce if you get OOM errors`
			```

			`4. Number of classes is already set to 5:`
			```yaml
			`MODEL:`
			`NUM_CLASSES: 5`
			```

			`## Step 3: Start Training`

			`### Basic Training Command`

			```bash
			`cd /home/ubuntu/projects/FishAction`

			`python slowfast/tools/run_net.py \`
			`--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \`
			`NUM_GPUS 1`
			```

			`### Training with Command-Line Overrides`

			`You can override config values from the command line:`

			```bash
			`python slowfast/tools/run_net.py \`
			`--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \`
			`TRAIN.BATCH_SIZE 4 \`
			`NUM_GPUS 1 \`
			`SOLVER.MAX_EPOCH 30 \`
			`SOLVER.BASE_LR 0.005`
			```

			`### Multi-GPU Training`

			`If you have multiple GPUs:`

			```bash
			`python slowfast/tools/run_net.py \`
			`--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \`
			`NUM_GPUS 2 \`
			`TRAIN.BATCH_SIZE 16`
			```

			`## Step 4: Monitor Training`

			`Training logs will be saved to:`
			`- Console output`
			- `checkpoints/fish_action/logs/` (if TensorBoard is enabled)

			`### Check Training Progress`

			```bash
			`# View latest checkpoint`
			`ls -lh checkpoints/fish_action/`

			`# View logs`
			`tail -f checkpoints/fish_action/logs/*.log`
			```

			`## Step 5: Evaluate Model`

			`After training, evaluate on validation set:`

			```bash
			`python slowfast/tools/run_net.py \`
			`--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \`
			`TRAIN.ENABLE False \`
			`TEST.ENABLE True \`
			`TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \`
			`NUM_GPUS 1`
			```

			`## Step 6: Test on Test Set`

			```bash
			`python slowfast/tools/run_net.py \`
			`--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \`
			`TRAIN.ENABLE False \`
			`TEST.ENABLE True \`
			`TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \`
			`DATA.PATH_TO_DATA_DIR /home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset \`
			`NUM_GPUS 1`
			```

			`## Troubleshooting`

			`### Out of Memory (OOM) Errors`

			`Reduce batch size:`
			```yaml
			`TRAIN:`
			`BATCH_SIZE: 4 # or even 2`
			```

			`Or reduce number of frames:`
			```yaml
			`DATA:`
			`NUM_FRAMES: 16 # instead of 32`
			```

			`### Model Not Loading`

			`1. Check checkpoint path:`
			```bash
			`ls -lh checkpoints/SLOWFAST_8x8_R50.pkl`
			```

			`2. Verify checkpoint type:`
			- `.pkl` files are usually Caffe2 format
			- `.pyth` files are PyTorch format
			- Set `CHECKPOINT_TYPE` accordingly

			`3. If using Caffe2 checkpoint:`
			```yaml
			`TRAIN:`
			`CHECKPOINT_TYPE: caffe2`
			```

			`### Dataset Not Found`

			`1. Verify CSV files exist:`
			```bash
			`ls -lh data/fish/fish_action_training_dataset/*.csv`
			```

			`2. Check video paths in CSV:`
			```bash
			`head data/fish/fish_action_training_dataset/train.csv`
			```

			`3. Verify PATH_PREFIX:`
			`- Should be absolute path to video directory`
			- Videos should be accessible at: `PATH_PREFIX + path_in_csv`

			`### Slow Training`

			`1. Increase NUM_WORKERS (if you have CPU cores):`
			```yaml
			`DATA_LOADER:`
			`NUM_WORKERS: 8`
			```

			`2. Use mixed precision training (if supported):`
			```yaml
			`TRAIN:`
			`MIXED_PRECISION: True`
			```

			`## Tips for Better Results`

			`1. Learning Rate: Start with 0.01 for fine-tuning, can go lower (0.001) if overfitting`
			`2. Epochs: Monitor validation accuracy - stop if it plateaus`
			`3. Data Augmentation: Already enabled in config (random crop, flip, etc.)`
			`4. Early Stopping: Manually stop if validation accuracy doesn't improve`

			`## Next Steps`

			`After training:`
			`1. Evaluate on test set`
			`2. Use the best checkpoint for inference`
			`3. Create inference script for new videos`

			`## Example: Complete Training Session`

			```bash
			`# 1. Download pretrained model`
			`mkdir -p checkpoints`
			`wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \`
			`-O checkpoints/SLOWFAST_8x8_R50.pkl`

			`# 2. Edit config file (set checkpoint path and verify dataset paths)`

			`# 3. Start training`
			`python slowfast/tools/run_net.py \`
			`--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \`
			`NUM_GPUS 1`

			`# 4. After training, evaluate`
			`python slowfast/tools/run_net.py \`
			`--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \`
			`TRAIN.ENABLE False \`
			`TEST.ENABLE True \`
			`TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \`
			`NUM_GPUS 1`
			```

			`## Your Classes`

			`Based on your dataset preparation, your 5 classes are:`
			`- 0: feeding`
			`- 1: normal_underwater`
			`- 2: normal_upperwater`
			`- 3: scared_underwater`
			`- 4: scared_upperwater`

			Check `data/fish/fish_action_training_dataset/label_map.txt` for the exact mapping.