Training Custom RL Models for Mortal Kombat II

This guide provides an overview of training reinforcement learning models for the Mortal Kombat II environment. For more detailed information on specific topics, see the dedicated guides:

Curriculum Learning: Progressive training through increasing difficulty levels
Model Evaluation: Testing and analyzing model performance
Fine-tuning Models: Improving pre-trained models for specific scenarios

Training a Model from Scratch

The train.py script provides a comprehensive framework for training RL models with various advanced features including optional curriculum learning and learning rate schedules.

Basic Usage

python train.py

Training Process Overview

Environment Setup: The script creates vectorized environments using SubprocVecEnv for parallel training
Model Initialization: Models are configured with hyperparameters optimized for fighting games
Training Loop: The model learns for a defined number of timesteps with periodic evaluation
Model Saving: The final model and best models during training are saved for later use

Customizing Training

To customize your training, edit train.py and modify:

Model Type

Choose from: - DQN: Standard Deep Q-Network - DoubleDQN: Double DQN for more stable training - DuelingDoubleDQN: Dueling architecture with Double DQN updates

# Example: Change model type
model = DuelingDoubleDQN(  # or DoubleDQN or DQN
    env=stacked_env,
    verbose=1,
    device="cuda",
    # other parameters...
)

Learning Rate Schedules

Choose from multiple learning rate schedules in mk_ai.utils.schedulers.Schedules:

# Linear decay
lr_schedule = Schedules.linear_decay(3.16e-4, 1e-5)

# Exponential decay
exp_decay_lr = Schedules.exponential_decay(3.16e-4, 0.295)

# Cyclical learning rates
cyclical_lr = Schedules.cyclical_lr(1e-4, 3e-4, 0.5)

Hyperparameters

Tune key hyperparameters for your specific training needs:

model = DuelingDoubleDQN(
    env=stacked_env,
    buffer_size=200000,       # Replay buffer size
    batch_size=32,            # Batch size for updates
    gamma=0.95,               # Discount factor
    learning_rate=lr_schedule,
    exploration_fraction=0.3, # Fraction of training for exploration
    exploration_initial_eps=0.9,
    exploration_final_eps=0.07,
    tensorboard_log="./logs/my_custom_model/"
)

Monitoring Training Progress

View training progress with TensorBoard:

tensorboard --logdir=./experiments_finals

This will show: - Learning curves (reward, loss) - Exploration rate decay - Evaluation performance

Best Practices

Training

Start with a simpler environment and gradually increase difficulty
Use curriculum learning for complex environments (see Curriculum Learning)
Monitor training statistics via TensorBoard logs
Save regular checkpoints during long training sessions
Consider using parallel environments (SubprocVecEnv) to speed up training
Experiment with different model architectures and learning rate schedules