Training Custom RL Models for Mortal Kombat II
This guide provides an overview of training reinforcement learning models for the Mortal Kombat II environment. For more detailed information on specific topics, see the dedicated guides:
- Curriculum Learning: Progressive training through increasing difficulty levels
- Model Evaluation: Testing and analyzing model performance
- Fine-tuning Models: Improving pre-trained models for specific scenarios
Training a Model from Scratch
The train.py
script provides a comprehensive framework for training RL models with various advanced features including optional curriculum learning and learning rate schedules.
Basic Usage
python train.py
Training Process Overview
- Environment Setup: The script creates vectorized environments using
SubprocVecEnv
for parallel training - Model Initialization: Models are configured with hyperparameters optimized for fighting games
- Training Loop: The model learns for a defined number of timesteps with periodic evaluation
- Model Saving: The final model and best models during training are saved for later use
Customizing Training
To customize your training, edit train.py
and modify:
Model Type
Choose from:
- DQN
: Standard Deep Q-Network
- DoubleDQN
: Double DQN for more stable training
- DuelingDoubleDQN
: Dueling architecture with Double DQN updates
# Example: Change model type
model = DuelingDoubleDQN( # or DoubleDQN or DQN
env=stacked_env,
verbose=1,
device="cuda",
# other parameters...
)
Learning Rate Schedules
Choose from multiple learning rate schedules in mk_ai.utils.schedulers.Schedules
:
# Linear decay
lr_schedule = Schedules.linear_decay(3.16e-4, 1e-5)
# Exponential decay
exp_decay_lr = Schedules.exponential_decay(3.16e-4, 0.295)
# Cyclical learning rates
cyclical_lr = Schedules.cyclical_lr(1e-4, 3e-4, 0.5)
Hyperparameters
Tune key hyperparameters for your specific training needs:
model = DuelingDoubleDQN(
env=stacked_env,
buffer_size=200000, # Replay buffer size
batch_size=32, # Batch size for updates
gamma=0.95, # Discount factor
learning_rate=lr_schedule,
exploration_fraction=0.3, # Fraction of training for exploration
exploration_initial_eps=0.9,
exploration_final_eps=0.07,
tensorboard_log="./logs/my_custom_model/"
)
Monitoring Training Progress
View training progress with TensorBoard:
tensorboard --logdir=./experiments_finals
This will show: - Learning curves (reward, loss) - Exploration rate decay - Evaluation performance
Best Practices
Training
- Start with a simpler environment and gradually increase difficulty
- Use curriculum learning for complex environments (see Curriculum Learning)
- Monitor training statistics via TensorBoard logs
- Save regular checkpoints during long training sessions
- Consider using parallel environments (SubprocVecEnv) to speed up training
- Experiment with different model architectures and learning rate schedules
Troubleshooting
Common Issues
- GPU Out of Memory: Reduce batch size or number of parallel environments
- Slow Learning: Adjust learning rate schedule or exploration parameters
- Overfitting: Increase environment variety or add regularization
Debugging Tips
- Add verbose logging to track specific behaviors
- Use render_mode="human" for visual debugging of agent behaviors
- Implement custom callbacks for detailed monitoring