Fine-tuning RL Models for Mortal Kombat II

This guide explains how to fine-tune pre-trained reinforcement learning models to improve their performance in specific scenarios or against challenging opponents.

What is Fine-tuning?

Fine-tuning is the process of continuing training on a pre-trained model, focusing on specific areas that need improvement. It's a form of transfer learning that leverages knowledge gained from prior training to efficiently adapt to new challenges.

When to Fine-tune

Consider fine-tuning your models when:

They perform well overall but struggle against specific opponents
You want to improve performance in advanced scenarios without starting from scratch
You need to adapt a general model to a specialized task
You've reached a performance plateau with regular training
You have limited computational resources for full retraining

Using the Fine-tuning Script

The Kane vs Abel framework includes a dedicated fine-tuning script (finetune.py) that handles loading pre-trained models and continuing their training on challenging scenarios.

Basic Usage

python finetune.py

By default, this will: 1. Load the model specified in the script 2. Create environments with challenging states 3. Continue training for a defined number of timesteps 4. Save the fine-tuned model

Fine-tuning Process

1. Select a Pre-trained Model

Choose a well-performing base model to fine-tune:

pretrained_model_path = os.path.join(MODEL_DIR, "kane", "DuellingDDQN_curriculum_16M_VeryEasy_3_Tiers")
model = DuelingDoubleDQN.load(pretrained_model_path, env=env, device="cuda")

2. Define Challenging States

Select specific challenging states where the model needs improvement:

challenging_states = [
    "VeryEasy.LiuKang-04",
    "VeryEasy.LiuKang-05",
    "VeryEasy.LiuKang-06",
    "VeryEasy.LiuKang-07",
    "VeryEasy.LiuKang-08"
]

3. Adjust Learning Rate for Fine-tuning

Use a more conservative learning rate to avoid catastrophic forgetting:

# Lower starting rate and gentler decay for fine-tuning
fine_tune_lr = Schedules.linear_decay(1e-4, 1e-5)  # Original might be 3e-4 to 1e-5
model.learning_rate = fine_tune_lr

4. Optionally Freeze Early Layers

To preserve learned features while allowing adaptation in higher layers:

# Uncomment to freeze feature extractor
# for param in model.policy.features_extractor.parameters():
#     param.requires_grad = False

5. Configure Evaluation During Fine-tuning

Set up evaluation callbacks to monitor progress:

eval_callback = CustomEvalCallback(
    eval_env=eval_env,
    best_model_save_path=os.path.join(LOG_DIR, "fine_tuned_best"),
    log_path=os.path.join(LOG_DIR, "fine_tune_eval"),
    eval_freq=62_500,  # How often to evaluate
    n_eval_episodes=10,
    deterministic=False,
    render=False,
    verbose=1
)

6. Continue Training

Train for additional timesteps, typically fewer than the original training:

model.learn(
    total_timesteps=2_000_000,  # Less than original training
    reset_num_timesteps=False,  # Continue from previous steps
    callback=eval_callback,
)

Customizing Fine-tuning

Target Specific Weaknesses

Identify and focus on specific weaknesses through careful state selection:

# Example: Focus on specific opponents
challenging_states = [
    "Medium.LiuKangVsBaraka",  # If struggling against Baraka
    "Hard.LiuKangVsReptile"     # If struggling against Reptile
]

Learning Rate Schedule Strategies

Different schedules for different fine-tuning goals:

# For minor adjustments (conservative)
conservative_lr = Schedules.linear_decay(5e-5, 1e-6)

# For significant adaptation (more aggressive)
adaptive_lr = Schedules.linear_decay(1e-4, 1e-5)

# For focused, short fine-tuning
cyclical_lr = Schedules.cyclical_lr(5e-5, 1e-4, 0.5)

Selective Layer Freezing

For more control over what parts of the model adapt:

# Option 1: Freeze just the convolutional base
for name, param in model.policy.features_extractor.cnn.named_parameters():
    param.requires_grad = False

# Option 2: Freeze specific layers
for name, param in model.policy.named_parameters():
    if "features_extractor" in name:
        param.requires_grad = False
    if "q_net.0" in name:  # First layer after feature extraction
        param.requires_grad = False

Exploration Settings

Adjust exploration parameters for fine-tuning:

# Reduce exploration for fine-tuning
model.exploration_schedule = Schedules.linear_decay(0.1, 0.01)  # Lower than original

Advanced Fine-tuning Techniques

Regularization for Fine-tuning

Add regularization to prevent overfitting to the new states:

# Add L2 regularization to optimizer (example for Adam)
from torch import optim

# Get current params
optimizer_params = model.policy.optimizer.defaults
# Add weight decay (L2 regularization)
optimizer_params['weight_decay'] = 1e-4
# Recreate optimizer
model.policy.optimizer = optim.Adam(
    model.policy.parameters(), 
    **optimizer_params
)

Elastic Weight Consolidation

For complex models, implement Elastic Weight Consolidation (EWC) to preserve important weights:

# Pseudo-code for EWC implementation
original_params = {name: param.clone().detach() for name, param in model.named_parameters()}
fisher_information = estimate_fisher_information(model, old_env)

def ewc_loss(model, original_params, fisher_information, lambda_ewc=5000):
    loss = 0
    for name, param in model.named_parameters():
        loss += fisher_information[name] * (param - original_params[name]).pow(2).sum()
    return lambda_ewc * loss

Rehearsal with Mixed Experiences

Fine-tune with a mix of old and new experiences to prevent forgetting:

# Create environments with mixed states
mixed_states = original_training_states + challenging_states
mixed_env = SubprocVecEnv([make_env(mixed_states) for _ in range(NUM_ENVS)])

Progressive Fine-tuning

Gradually introduce challenging scenarios:

# Start with a mix favoring original states
phase1_states = original_states + [challenging_states[0]]
# Then introduce more challenging states
phase2_states = original_states + challenging_states[:3]
# Finally use all challenging states
phase3_states = challenging_states

Evaluating Fine-tuning Success

After fine-tuning, evaluate comprehensively:

# Evaluate on original states
python test.py --model_path models/pre_finetuned.zip --model_type DUELINGDDQN --states "original_state1,original_state2" --individual_eval

python test.py --model_path models/post_finetuned.zip --model_type DUELINGDDQN --states "original_state1,original_state2" --individual_eval

# Evaluate on challenging states
python test.py --model_path models/pre_finetuned.zip --model_type DUELINGDDQN --states "challenging_state1,challenging_state2" --individual_eval

python test.py --model_path models/post_finetuned.zip --model_type DUELINGDDQN --states "challenging_state1,challenging_state2" --individual_eval

Look for: 1. Improvement on challenging states 2. Maintenance of performance on original states 3. Overall win rate changes 4. Changes in behavior patterns

Example: Complete Fine-tuning Workflow

import os
from mk_ai.agents import DuelingDoubleDQN
from mk_ai.utils import Schedules
from mk_ai.callbacks import CustomEvalCallback
from stable_baselines3.common.vec_env import SubprocVecEnv, VecFrameStack

# 1. Identify where the model is struggling (through evaluation)
# python test.py --model_path models/base_model.zip --model_type DUELINGDDQN --states "all_states" --individual_eval

# 2. Select challenging states based on evaluation
challenging_states = [
    "VeryEasy.LiuKang-06",  # Low win rate identified here
    "VeryEasy.LiuKang-07",  # Low win rate identified here
    "VeryEasy.LiuKang-08"   # Low win rate identified here
]

# 3. Create environments for fine-tuning
env = SubprocVecEnv([make_env(challenging_states) for _ in range(8)])
env = VecFrameStack(env, n_stack=4)

# 4. Load pre-trained model
model = DuelingDoubleDQN.load("models/base_model.zip", env=env, device="cuda")

# 5. Configure for fine-tuning
model.learning_rate = Schedules.linear_decay(1e-4, 1e-5)

# 6. Setup evaluation
eval_env = DummyVecEnv([make_env(challenging_states)])
eval_env = VecFrameStack(eval_env, n_stack=4)
eval_callback = CustomEvalCallback(
    eval_env=eval_env,
    best_model_save_path="./logs/fine_tuned_best",
    log_path="./logs/fine_tune_eval",
    eval_freq=60000,
    n_eval_episodes=10
)

# 7. Fine-tune the model
model.learn(
    total_timesteps=2_000_000,
    reset_num_timesteps=False,
    callback=eval_callback
)

# 8. Save fine-tuned model
model.save("models/fine_tuned_model")

# 9. Evaluate the fine-tuned model
# python test.py --model_path models/fine_tuned_model.zip --model_type DUELINGDDQN --states "all_states" --individual_eval

Best Practices for Fine-tuning

Always evaluate before and after to quantify improvements
Start with a lower learning rate than the original training
Be selective with states - target specific weaknesses
Monitor for catastrophic forgetting on original scenarios
Keep fine-tuning sessions shorter than original training
Save intermediate checkpoints during fine-tuning
Consider layer freezing for specialized adaptations
Use TensorBoard to monitor the fine-tuning process