Initial commit: FishServer monorepo (FishAction, FishMeasure, fish_api)
Made-with: Cursor
This commit is contained in:
252
FishAction/TRAINING_GUIDE.md
Executable file
252
FishAction/TRAINING_GUIDE.md
Executable file
@@ -0,0 +1,252 @@
|
||||
# Training Guide: Fine-tuning SlowFast for Fish Action Classification
|
||||
|
||||
This guide will help you fine-tune a pretrained SlowFast model on your fish action dataset.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. ✅ You have generated CSV files using `prepare_fish_dataset.py`
|
||||
2. ✅ Your videos are organized in `~/data/fish/fish_action_videos/`
|
||||
3. ✅ CSV files are in `./data/fish/fish_action_training_dataset/`
|
||||
|
||||
## Step 1: Download Pretrained Model
|
||||
|
||||
Download a pretrained SlowFast model from the [Model Zoo](https://github.com/facebookresearch/SlowFast/blob/main/MODEL_ZOO.md).
|
||||
|
||||
### Recommended Models:
|
||||
|
||||
**SlowFast 8x8 R50 (Kinetics 400)** - Good balance of accuracy and speed:
|
||||
```bash
|
||||
# Create checkpoints directory
|
||||
mkdir -p checkpoints
|
||||
|
||||
# Download pretrained model (Caffe2 format)
|
||||
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
|
||||
-O checkpoints/SLOWFAST_8x8_R50.pkl
|
||||
```
|
||||
|
||||
Or download PyTorch format if available:
|
||||
```bash
|
||||
# PyTorch format (if available)
|
||||
wget <pytorch_model_url> -O checkpoints/SLOWFAST_8x8_R50.pyth
|
||||
```
|
||||
|
||||
## Step 2: Configure Your Training
|
||||
|
||||
Edit the config file `configs/fish_action_SLOWFAST_8x8_R50.yaml`:
|
||||
|
||||
1. **Set pretrained model path:**
|
||||
```yaml
|
||||
TRAIN:
|
||||
CHECKPOINT_FILE_PATH: "checkpoints/SLOWFAST_8x8_R50.pkl"
|
||||
CHECKPOINT_TYPE: caffe2 # or "pytorch" if using .pyth file
|
||||
```
|
||||
|
||||
2. **Verify dataset paths:**
|
||||
```yaml
|
||||
DATA:
|
||||
PATH_TO_DATA_DIR: "/home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset"
|
||||
PATH_PREFIX: "/home/ubuntu/data/fish/fish_action_videos"
|
||||
```
|
||||
|
||||
3. **Adjust batch size** based on your GPU memory:
|
||||
```yaml
|
||||
TRAIN:
|
||||
BATCH_SIZE: 8 # Reduce if you get OOM errors
|
||||
```
|
||||
|
||||
4. **Number of classes** is already set to 5:
|
||||
```yaml
|
||||
MODEL:
|
||||
NUM_CLASSES: 5
|
||||
```
|
||||
|
||||
## Step 3: Start Training
|
||||
|
||||
### Basic Training Command
|
||||
|
||||
```bash
|
||||
cd /home/ubuntu/projects/FishAction
|
||||
|
||||
python slowfast/tools/run_net.py \
|
||||
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||||
NUM_GPUS 1
|
||||
```
|
||||
|
||||
### Training with Command-Line Overrides
|
||||
|
||||
You can override config values from the command line:
|
||||
|
||||
```bash
|
||||
python slowfast/tools/run_net.py \
|
||||
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||||
TRAIN.BATCH_SIZE 4 \
|
||||
NUM_GPUS 1 \
|
||||
SOLVER.MAX_EPOCH 30 \
|
||||
SOLVER.BASE_LR 0.005
|
||||
```
|
||||
|
||||
### Multi-GPU Training
|
||||
|
||||
If you have multiple GPUs:
|
||||
|
||||
```bash
|
||||
python slowfast/tools/run_net.py \
|
||||
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||||
NUM_GPUS 2 \
|
||||
TRAIN.BATCH_SIZE 16
|
||||
```
|
||||
|
||||
## Step 4: Monitor Training
|
||||
|
||||
Training logs will be saved to:
|
||||
- Console output
|
||||
- `checkpoints/fish_action/logs/` (if TensorBoard is enabled)
|
||||
|
||||
### Check Training Progress
|
||||
|
||||
```bash
|
||||
# View latest checkpoint
|
||||
ls -lh checkpoints/fish_action/
|
||||
|
||||
# View logs
|
||||
tail -f checkpoints/fish_action/logs/*.log
|
||||
```
|
||||
|
||||
## Step 5: Evaluate Model
|
||||
|
||||
After training, evaluate on validation set:
|
||||
|
||||
```bash
|
||||
python slowfast/tools/run_net.py \
|
||||
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||||
TRAIN.ENABLE False \
|
||||
TEST.ENABLE True \
|
||||
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
|
||||
NUM_GPUS 1
|
||||
```
|
||||
|
||||
## Step 6: Test on Test Set
|
||||
|
||||
```bash
|
||||
python slowfast/tools/run_net.py \
|
||||
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||||
TRAIN.ENABLE False \
|
||||
TEST.ENABLE True \
|
||||
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
|
||||
DATA.PATH_TO_DATA_DIR /home/ubuntu/projects/FishAction/data/fish/fish_action_training_dataset \
|
||||
NUM_GPUS 1
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Out of Memory (OOM) Errors
|
||||
|
||||
Reduce batch size:
|
||||
```yaml
|
||||
TRAIN:
|
||||
BATCH_SIZE: 4 # or even 2
|
||||
```
|
||||
|
||||
Or reduce number of frames:
|
||||
```yaml
|
||||
DATA:
|
||||
NUM_FRAMES: 16 # instead of 32
|
||||
```
|
||||
|
||||
### Model Not Loading
|
||||
|
||||
1. **Check checkpoint path:**
|
||||
```bash
|
||||
ls -lh checkpoints/SLOWFAST_8x8_R50.pkl
|
||||
```
|
||||
|
||||
2. **Verify checkpoint type:**
|
||||
- `.pkl` files are usually Caffe2 format
|
||||
- `.pyth` files are PyTorch format
|
||||
- Set `CHECKPOINT_TYPE` accordingly
|
||||
|
||||
3. **If using Caffe2 checkpoint:**
|
||||
```yaml
|
||||
TRAIN:
|
||||
CHECKPOINT_TYPE: caffe2
|
||||
```
|
||||
|
||||
### Dataset Not Found
|
||||
|
||||
1. **Verify CSV files exist:**
|
||||
```bash
|
||||
ls -lh data/fish/fish_action_training_dataset/*.csv
|
||||
```
|
||||
|
||||
2. **Check video paths in CSV:**
|
||||
```bash
|
||||
head data/fish/fish_action_training_dataset/train.csv
|
||||
```
|
||||
|
||||
3. **Verify PATH_PREFIX:**
|
||||
- Should be absolute path to video directory
|
||||
- Videos should be accessible at: `PATH_PREFIX + path_in_csv`
|
||||
|
||||
### Slow Training
|
||||
|
||||
1. **Increase NUM_WORKERS** (if you have CPU cores):
|
||||
```yaml
|
||||
DATA_LOADER:
|
||||
NUM_WORKERS: 8
|
||||
```
|
||||
|
||||
2. **Use mixed precision training** (if supported):
|
||||
```yaml
|
||||
TRAIN:
|
||||
MIXED_PRECISION: True
|
||||
```
|
||||
|
||||
## Tips for Better Results
|
||||
|
||||
1. **Learning Rate:** Start with 0.01 for fine-tuning, can go lower (0.001) if overfitting
|
||||
2. **Epochs:** Monitor validation accuracy - stop if it plateaus
|
||||
3. **Data Augmentation:** Already enabled in config (random crop, flip, etc.)
|
||||
4. **Early Stopping:** Manually stop if validation accuracy doesn't improve
|
||||
|
||||
## Next Steps
|
||||
|
||||
After training:
|
||||
1. Evaluate on test set
|
||||
2. Use the best checkpoint for inference
|
||||
3. Create inference script for new videos
|
||||
|
||||
## Example: Complete Training Session
|
||||
|
||||
```bash
|
||||
# 1. Download pretrained model
|
||||
mkdir -p checkpoints
|
||||
wget https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/kinetics400/SLOWFAST_8x8_R50.pkl \
|
||||
-O checkpoints/SLOWFAST_8x8_R50.pkl
|
||||
|
||||
# 2. Edit config file (set checkpoint path and verify dataset paths)
|
||||
|
||||
# 3. Start training
|
||||
python slowfast/tools/run_net.py \
|
||||
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||||
NUM_GPUS 1
|
||||
|
||||
# 4. After training, evaluate
|
||||
python slowfast/tools/run_net.py \
|
||||
--cfg configs/fish_action_SLOWFAST_8x8_R50.yaml \
|
||||
TRAIN.ENABLE False \
|
||||
TEST.ENABLE True \
|
||||
TEST.CHECKPOINT_FILE_PATH checkpoints/fish_action/checkpoints/checkpoint_epoch_00050.pyth \
|
||||
NUM_GPUS 1
|
||||
```
|
||||
|
||||
## Your Classes
|
||||
|
||||
Based on your dataset preparation, your 5 classes are:
|
||||
- 0: feeding
|
||||
- 1: normal_underwater
|
||||
- 2: normal_upperwater
|
||||
- 3: scared_underwater
|
||||
- 4: scared_upperwater
|
||||
|
||||
Check `data/fish/fish_action_training_dataset/label_map.txt` for the exact mapping.
|
||||
|
||||
Reference in New Issue
Block a user