253 lines
6.0 KiB
Markdown
253 lines
6.0 KiB
Markdown
|
|
# Training Guide: Fine-tuning SlowFast for Fish Action Classification
|
||
|
|
|
||
|
|
This guide will help you fine-tune a pretrained SlowFast model on your fish action dataset.
|
||
|
|
|
||
|
|
## Prerequisites
|
||
|
|
|
||
|
|
1. ✅ You have generated CSV files using `prepare_fish_dataset.py`
|
||
|
|
2. ✅ Your videos are organized in `~/data/fish/fish_action_videos/`
|
||
|
|
3. ✅ CSV files are in `./data/fish/fish_action_training_dataset/`
|
||
|
|
|
||
|
|
## Step 1: Download Pretrained Model
|
||
|
|
|
||
|
|
Download a pretrained SlowFast model from the [Model Zoo](https://github.com/facebookresearch/SlowFast/blob/main/MODEL_ZOO.md).
|
||
|
|
|
||
|
|
### Recommended Models:
|
||
|
|
|
||
|
|
**SlowFast 8x8 R50 (Kinetics 400)** - Good balance of accuracy and speed:
|
||
|
|
```bash
|
||
|
|
# Create checkpoints directory
|
||
|
|
mkdir -p checkpoints
|
||
|
|
|
||
|
|
# Download pretrained model (Caffe2 format)
|
||
|
|
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
|
||
|
|
-O checkpoints/SLOWFAST_8x8_R50.pkl
|
||
|
|
```
|
||
|
|
|
||
|
|
Or download PyTorch format if available:
|
||
|
|
```bash
|
||
|
|
# PyTorch format (if available)
|
||
|
|
wget <pytorch_model_url> -O checkpoints/SLOWFAST_8x8_R50.pyth
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 2: Configure Your Training
|
||
|
|
|
||
|
|
Edit the config file `configs/fish_action_SLOWFAST_8x8_R50.yaml`:
|
||
|
|
|
||
|
|
1. **Set pretrained model path:**
|
||
|
|
```yaml
|
||
|
|
TRAIN:
|
||
|
|
CHECKPOINT_FILE_PATH: "checkpoints/SLOWFAST_8x8_R50.pkl"
|
||
|
|
CHECKPOINT_TYPE: caffe2 # or "pytorch" if using .pyth file
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Verify dataset paths:**
|
||
|
|
```yaml
|
||
|
|
DATA:
|
||
|
|
PATH_TO_DATA_DIR: "/home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset"
|
||
|
|
PATH_PREFIX: "/home/ubuntu/data/fish/fish_action_videos"
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Adjust batch size** based on your GPU memory:
|
||
|
|
```yaml
|
||
|
|
TRAIN:
|
||
|
|
BATCH_SIZE: 8 # Reduce if you get OOM errors
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Number of classes** is already set to 5:
|
||
|
|
```yaml
|
||
|
|
MODEL:
|
||
|
|
NUM_CLASSES: 5
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 3: Start Training
|
||
|
|
|
||
|
|
### Basic Training Command
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd /home/ubuntu/projects/FishAction
|
||
|
|
|
||
|
|
python slowfast/tools/run_net.py \
|
||
|
|
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||
|
|
NUM_GPUS 1
|
||
|
|
```
|
||
|
|
|
||
|
|
### Training with Command-Line Overrides
|
||
|
|
|
||
|
|
You can override config values from the command line:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python slowfast/tools/run_net.py \
|
||
|
|
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||
|
|
TRAIN.BATCH_SIZE 4 \
|
||
|
|
NUM_GPUS 1 \
|
||
|
|
SOLVER.MAX_EPOCH 30 \
|
||
|
|
SOLVER.BASE_LR 0.005
|
||
|
|
```
|
||
|
|
|
||
|
|
### Multi-GPU Training
|
||
|
|
|
||
|
|
If you have multiple GPUs:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python slowfast/tools/run_net.py \
|
||
|
|
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||
|
|
NUM_GPUS 2 \
|
||
|
|
TRAIN.BATCH_SIZE 16
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 4: Monitor Training
|
||
|
|
|
||
|
|
Training logs will be saved to:
|
||
|
|
- Console output
|
||
|
|
- `checkpoints/fish_action/logs/` (if TensorBoard is enabled)
|
||
|
|
|
||
|
|
### Check Training Progress
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# View latest checkpoint
|
||
|
|
ls -lh checkpoints/fish_action/
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
tail -f checkpoints/fish_action/logs/*.log
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 5: Evaluate Model
|
||
|
|
|
||
|
|
After training, evaluate on validation set:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python slowfast/tools/run_net.py \
|
||
|
|
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||
|
|
TRAIN.ENABLE False \
|
||
|
|
TEST.ENABLE True \
|
||
|
|
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
|
||
|
|
NUM_GPUS 1
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 6: Test on Test Set
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python slowfast/tools/run_net.py \
|
||
|
|
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||
|
|
TRAIN.ENABLE False \
|
||
|
|
TEST.ENABLE True \
|
||
|
|
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
|
||
|
|
DATA.PATH_TO_DATA_DIR /home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset \
|
||
|
|
NUM_GPUS 1
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Out of Memory (OOM) Errors
|
||
|
|
|
||
|
|
Reduce batch size:
|
||
|
|
```yaml
|
||
|
|
TRAIN:
|
||
|
|
BATCH_SIZE: 4 # or even 2
|
||
|
|
```
|
||
|
|
|
||
|
|
Or reduce number of frames:
|
||
|
|
```yaml
|
||
|
|
DATA:
|
||
|
|
NUM_FRAMES: 16 # instead of 32
|
||
|
|
```
|
||
|
|
|
||
|
|
### Model Not Loading
|
||
|
|
|
||
|
|
1. **Check checkpoint path:**
|
||
|
|
```bash
|
||
|
|
ls -lh checkpoints/SLOWFAST_8x8_R50.pkl
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Verify checkpoint type:**
|
||
|
|
- `.pkl` files are usually Caffe2 format
|
||
|
|
- `.pyth` files are PyTorch format
|
||
|
|
- Set `CHECKPOINT_TYPE` accordingly
|
||
|
|
|
||
|
|
3. **If using Caffe2 checkpoint:**
|
||
|
|
```yaml
|
||
|
|
TRAIN:
|
||
|
|
CHECKPOINT_TYPE: caffe2
|
||
|
|
```
|
||
|
|
|
||
|
|
### Dataset Not Found
|
||
|
|
|
||
|
|
1. **Verify CSV files exist:**
|
||
|
|
```bash
|
||
|
|
ls -lh data/fish/fish_action_training_dataset/*.csv
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Check video paths in CSV:**
|
||
|
|
```bash
|
||
|
|
head data/fish/fish_action_training_dataset/train.csv
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Verify PATH_PREFIX:**
|
||
|
|
- Should be absolute path to video directory
|
||
|
|
- Videos should be accessible at: `PATH_PREFIX + path_in_csv`
|
||
|
|
|
||
|
|
### Slow Training
|
||
|
|
|
||
|
|
1. **Increase NUM_WORKERS** (if you have CPU cores):
|
||
|
|
```yaml
|
||
|
|
DATA_LOADER:
|
||
|
|
NUM_WORKERS: 8
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Use mixed precision training** (if supported):
|
||
|
|
```yaml
|
||
|
|
TRAIN:
|
||
|
|
MIXED_PRECISION: True
|
||
|
|
```
|
||
|
|
|
||
|
|
## Tips for Better Results
|
||
|
|
|
||
|
|
1. **Learning Rate:** Start with 0.01 for fine-tuning, can go lower (0.001) if overfitting
|
||
|
|
2. **Epochs:** Monitor validation accuracy - stop if it plateaus
|
||
|
|
3. **Data Augmentation:** Already enabled in config (random crop, flip, etc.)
|
||
|
|
4. **Early Stopping:** Manually stop if validation accuracy doesn't improve
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
After training:
|
||
|
|
1. Evaluate on test set
|
||
|
|
2. Use the best checkpoint for inference
|
||
|
|
3. Create inference script for new videos
|
||
|
|
|
||
|
|
## Example: Complete Training Session
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Download pretrained model
|
||
|
|
mkdir -p checkpoints
|
||
|
|
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
|
||
|
|
-O checkpoints/SLOWFAST_8x8_R50.pkl
|
||
|
|
|
||
|
|
# 2. Edit config file (set checkpoint path and verify dataset paths)
|
||
|
|
|
||
|
|
# 3. Start training
|
||
|
|
python slowfast/tools/run_net.py \
|
||
|
|
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||
|
|
NUM_GPUS 1
|
||
|
|
|
||
|
|
# 4. After training, evaluate
|
||
|
|
python slowfast/tools/run_net.py \
|
||
|
|
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||
|
|
TRAIN.ENABLE False \
|
||
|
|
TEST.ENABLE True \
|
||
|
|
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
|
||
|
|
NUM_GPUS 1
|
||
|
|
```
|
||
|
|
|
||
|
|
## Your Classes
|
||
|
|
|
||
|
|
Based on your dataset preparation, your 5 classes are:
|
||
|
|
- 0: feeding
|
||
|
|
- 1: normal_underwater
|
||
|
|
- 2: normal_upperwater
|
||
|
|
- 3: scared_underwater
|
||
|
|
- 4: scared_upperwater
|
||
|
|
|
||
|
|
Check `data/fish/fish_action_training_dataset/label_map.txt` for the exact mapping.
|
||
|
|
|