Files
FishServer/FishAction/TRAINING_GUIDE.md
2026-05-06 15:59:38 +08:00

6.0 KiB

Training Guide: Fine-tuning SlowFast for Fish Action Classification

This guide will help you fine-tune a pretrained SlowFast model on your fish action dataset.

Prerequisites

  1. You have generated CSV files using prepare_fish_dataset.py
  2. Your videos are organized in ~/data/fish/fish_action_videos/
  3. CSV files are in ./data/fish/fish_action_training_dataset/

Step 1: Download Pretrained Model

Download a pretrained SlowFast model from the Model Zoo.

SlowFast 8x8 R50 (Kinetics 400) - Good balance of accuracy and speed:

# Create checkpoints directory
mkdir -p checkpoints

# Download pretrained model (Caffe2 format)
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
    -O checkpoints/SLOWFAST_8x8_R50.pkl

Or download PyTorch format if available:

# PyTorch format (if available)
wget <pytorch_model_url> -O checkpoints/SLOWFAST_8x8_R50.pyth

Step 2: Configure Your Training

Edit the config file configs/fish_action_SLOWFAST_8x8_R50.yaml:

  1. Set pretrained model path:

    TRAIN:
      CHECKPOINT_FILE_PATH: "checkpoints/SLOWFAST_8x8_R50.pkl"
      CHECKPOINT_TYPE: caffe2  # or "pytorch" if using .pyth file
    
  2. Verify dataset paths:

    DATA:
      PATH_TO_DATA_DIR: "/home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset"
      PATH_PREFIX: "/home/ubuntu/data/fish/fish_action_videos"
    
  3. Adjust batch size based on your GPU memory:

    TRAIN:
      BATCH_SIZE: 8  # Reduce if you get OOM errors
    
  4. Number of classes is already set to 5:

    MODEL:
      NUM_CLASSES: 5
    

Step 3: Start Training

Basic Training Command

cd /home/ubuntu/projects/FishAction

python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    NUM_GPUS 1

Training with Command-Line Overrides

You can override config values from the command line:

python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    TRAIN.BATCH_SIZE 4 \
    NUM_GPUS 1 \
    SOLVER.MAX_EPOCH 30 \
    SOLVER.BASE_LR 0.005

Multi-GPU Training

If you have multiple GPUs:

python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    NUM_GPUS 2 \
    TRAIN.BATCH_SIZE 16

Step 4: Monitor Training

Training logs will be saved to:

  • Console output
  • checkpoints/fish_action/logs/ (if TensorBoard is enabled)

Check Training Progress

# View latest checkpoint
ls -lh checkpoints/fish_action/

# View logs
tail -f checkpoints/fish_action/logs/*.log

Step 5: Evaluate Model

After training, evaluate on validation set:

python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    TRAIN.ENABLE False \
    TEST.ENABLE True \
    TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
    NUM_GPUS 1

Step 6: Test on Test Set

python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    TRAIN.ENABLE False \
    TEST.ENABLE True \
    TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
    DATA.PATH_TO_DATA_DIR /home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset \
    NUM_GPUS 1

Troubleshooting

Out of Memory (OOM) Errors

Reduce batch size:

TRAIN:
  BATCH_SIZE: 4  # or even 2

Or reduce number of frames:

DATA:
  NUM_FRAMES: 16  # instead of 32

Model Not Loading

  1. Check checkpoint path:

    ls -lh checkpoints/SLOWFAST_8x8_R50.pkl
    
  2. Verify checkpoint type:

    • .pkl files are usually Caffe2 format
    • .pyth files are PyTorch format
    • Set CHECKPOINT_TYPE accordingly
  3. If using Caffe2 checkpoint:

    TRAIN:
      CHECKPOINT_TYPE: caffe2
    

Dataset Not Found

  1. Verify CSV files exist:

    ls -lh data/fish/fish_action_training_dataset/*.csv
    
  2. Check video paths in CSV:

    head data/fish/fish_action_training_dataset/train.csv
    
  3. Verify PATH_PREFIX:

    • Should be absolute path to video directory
    • Videos should be accessible at: PATH_PREFIX + path_in_csv

Slow Training

  1. Increase NUM_WORKERS (if you have CPU cores):

    DATA_LOADER:
      NUM_WORKERS: 8
    
  2. Use mixed precision training (if supported):

    TRAIN:
      MIXED_PRECISION: True
    

Tips for Better Results

  1. Learning Rate: Start with 0.01 for fine-tuning, can go lower (0.001) if overfitting
  2. Epochs: Monitor validation accuracy - stop if it plateaus
  3. Data Augmentation: Already enabled in config (random crop, flip, etc.)
  4. Early Stopping: Manually stop if validation accuracy doesn't improve

Next Steps

After training:

  1. Evaluate on test set
  2. Use the best checkpoint for inference
  3. Create inference script for new videos

Example: Complete Training Session

# 1. Download pretrained model
mkdir -p checkpoints
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
    -O checkpoints/SLOWFAST_8x8_R50.pkl

# 2. Edit config file (set checkpoint path and verify dataset paths)

# 3. Start training
python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    NUM_GPUS 1

# 4. After training, evaluate
python slowfast/tools/run_net.py \
    --cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
    TRAIN.ENABLE False \
    TEST.ENABLE True \
    TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
    NUM_GPUS 1

Your Classes

Based on your dataset preparation, your 5 classes are:

  • 0: feeding
  • 1: normal_underwater
  • 2: normal_upperwater
  • 3: scared_underwater
  • 4: scared_upperwater

Check data/fish/fish_action_training_dataset/label_map.txt for the exact mapping.