Files
FishServer/FishAction/TRAINING_GUIDE.md
2026-04-08 19:32:23 +08:00

253 lines
6.0 KiB
Markdown
Executable File

# Training Guide: Fine-tuning SlowFast for Fish Action Classification
This guide will help you fine-tune a pretrained SlowFast model on your fish action dataset.
## Prerequisites
1. ✅ You have generated CSV files using `prepare_fish_dataset.py`
2. ✅ Your videos are organized in `~/data/fish/fish_action_videos/`
3. ✅ CSV files are in `./data/fish/fish_action_training_dataset/`
## Step 1: Download Pretrained Model
Download a pretrained SlowFast model from the [Model Zoo](https://github.com/facebookresearch/SlowFast/blob/main/MODEL_ZOO.md).
### Recommended Models:
**SlowFast 8x8 R50 (Kinetics 400)** - Good balance of accuracy and speed:
```bash
# Create checkpoints directory
mkdir -p checkpoints
# Download pretrained model (Caffe2 format)
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
-O checkpoints/SLOWFAST_8x8_R50.pkl
```
Or download PyTorch format if available:
```bash
# PyTorch format (if available)
wget <pytorch_model_url> -O checkpoints/SLOWFAST_8x8_R50.pyth
```
## Step 2: Configure Your Training
Edit the config file `configs/fish_action_SLOWFAST_8x8_R50.yaml`:
1. **Set pretrained model path:**
```yaml
TRAIN:
CHECKPOINT_FILE_PATH: "checkpoints/SLOWFAST_8x8_R50.pkl"
CHECKPOINT_TYPE: caffe2 # or "pytorch" if using .pyth file
```
2. **Verify dataset paths:**
```yaml
DATA:
PATH_TO_DATA_DIR: "/home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset"
PATH_PREFIX: "/home/ubuntu/data/fish/fish_action_videos"
```
3. **Adjust batch size** based on your GPU memory:
```yaml
TRAIN:
BATCH_SIZE: 8 # Reduce if you get OOM errors
```
4. **Number of classes** is already set to 5:
```yaml
MODEL:
NUM_CLASSES: 5
```
## Step 3: Start Training
### Basic Training Command
```bash
cd /home/ubuntu/projects/FishAction
python slowfast/tools/run_net.py \
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
NUM_GPUS 1
```
### Training with Command-Line Overrides
You can override config values from the command line:
```bash
python slowfast/tools/run_net.py \
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
TRAIN.BATCH_SIZE 4 \
NUM_GPUS 1 \
SOLVER.MAX_EPOCH 30 \
SOLVER.BASE_LR 0.005
```
### Multi-GPU Training
If you have multiple GPUs:
```bash
python slowfast/tools/run_net.py \
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
NUM_GPUS 2 \
TRAIN.BATCH_SIZE 16
```
## Step 4: Monitor Training
Training logs will be saved to:
- Console output
- `checkpoints/fish_action/logs/` (if TensorBoard is enabled)
### Check Training Progress
```bash
# View latest checkpoint
ls -lh checkpoints/fish_action/
# View logs
tail -f checkpoints/fish_action/logs/*.log
```
## Step 5: Evaluate Model
After training, evaluate on validation set:
```bash
python slowfast/tools/run_net.py \
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
NUM_GPUS 1
```
## Step 6: Test on Test Set
```bash
python slowfast/tools/run_net.py \
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
DATA.PATH_TO_DATA_DIR /home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset \
NUM_GPUS 1
```
## Troubleshooting
### Out of Memory (OOM) Errors
Reduce batch size:
```yaml
TRAIN:
BATCH_SIZE: 4 # or even 2
```
Or reduce number of frames:
```yaml
DATA:
NUM_FRAMES: 16 # instead of 32
```
### Model Not Loading
1. **Check checkpoint path:**
```bash
ls -lh checkpoints/SLOWFAST_8x8_R50.pkl
```
2. **Verify checkpoint type:**
- `.pkl` files are usually Caffe2 format
- `.pyth` files are PyTorch format
- Set `CHECKPOINT_TYPE` accordingly
3. **If using Caffe2 checkpoint:**
```yaml
TRAIN:
CHECKPOINT_TYPE: caffe2
```
### Dataset Not Found
1. **Verify CSV files exist:**
```bash
ls -lh data/fish/fish_action_training_dataset/*.csv
```
2. **Check video paths in CSV:**
```bash
head data/fish/fish_action_training_dataset/train.csv
```
3. **Verify PATH_PREFIX:**
- Should be absolute path to video directory
- Videos should be accessible at: `PATH_PREFIX + path_in_csv`
### Slow Training
1. **Increase NUM_WORKERS** (if you have CPU cores):
```yaml
DATA_LOADER:
NUM_WORKERS: 8
```
2. **Use mixed precision training** (if supported):
```yaml
TRAIN:
MIXED_PRECISION: True
```
## Tips for Better Results
1. **Learning Rate:** Start with 0.01 for fine-tuning, can go lower (0.001) if overfitting
2. **Epochs:** Monitor validation accuracy - stop if it plateaus
3. **Data Augmentation:** Already enabled in config (random crop, flip, etc.)
4. **Early Stopping:** Manually stop if validation accuracy doesn't improve
## Next Steps
After training:
1. Evaluate on test set
2. Use the best checkpoint for inference
3. Create inference script for new videos
## Example: Complete Training Session
```bash
# 1. Download pretrained model
mkdir -p checkpoints
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
-O checkpoints/SLOWFAST_8x8_R50.pkl
# 2. Edit config file (set checkpoint path and verify dataset paths)
# 3. Start training
python slowfast/tools/run_net.py \
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
NUM_GPUS 1
# 4. After training, evaluate
python slowfast/tools/run_net.py \
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
NUM_GPUS 1
```
## Your Classes
Based on your dataset preparation, your 5 classes are:
- 0: feeding
- 1: normal_underwater
- 2: normal_upperwater
- 3: scared_underwater
- 4: scared_upperwater
Check `data/fish/fish_action_training_dataset/label_map.txt` for the exact mapping.