𦡠PPO BipedalWalker-v3 [A100 SOTA / Score: 314.47]
This model is a state-of-the-art (SOTA) agent for BipedalWalker-v3, trained using Stable Baselines3 on an NVIDIA A100 GPU.
Achieving a mean score of 314.47 with extremely low variance (+/- 1.41), this agent demonstrates a highly optimized running gait that effectively solves the environment to its theoretical limit.
π Performance Highlights
| Metric | Score | Description |
|---|---|---|
| Mean Reward | 314.47 | Far exceeds the "solved" threshold of 300. |
| Variance | +/- 1.41 | Extremely stable behavior. Consistent performance. |
| Training Speed | ~4,000 FPS | Trained on NVIDIA A100 using 16 parallel environments. |
π₯ Agent Behavior
(Please check the Files tab for the replay video if not displayed automatically)
Observation: The agent has converged on a low-center-of-mass "gliding" gait. This strategy minimizes head oscillation penalties and torque usage, allowing it to complete episodes rapidly (approx. 1000 steps) without falling. This is a common optimal policy for high-scoring agents in this specific environment.
β‘ Training Configuration (A100 Optimized)
The hyperparameters were tuned to saturate the A100's compute capacity (Massive Batch Size) while enforcing strict physical control.
# Key Hyperparameters for A100
n_envs = 16 # Parallel environments to feed the GPU
n_steps = 2048
batch_size = 16384 # (1024 * 16) Massive batch for stable updates
learning_rate = 3e-4
ent_coef = 0.001 # Low entropy for precision control
clip_range = 0.18
gamma = 0.99
gae_lambda = 0.95
policy = "MlpPolicy" # [256, 256]
device = "cuda" # NVIDIA A100-SXM4-80GB
π» Usage
import gymnasium as gym
from stable_baselines3 import PPO
# Load the SOTA model
model = PPO.load("beachcities/ppo-BipedalWalker-v3-A100-SOTA")
# Create environment
env = gym.make("BipedalWalker-v3", render_mode="human")
# Enjoy the perfect run
obs, _ = env.reset()
done = False
while not done:
# Use deterministic=True for best performance
action, _ = model.predict(obs, deterministic=True)
obs, _, terminated, truncated, _ = env.step(action)
done = terminated or truncated
- Downloads last month
- 37
Evaluation results
- mean_reward on BipedalWalker-v3self-reported314.47 +/- 1.41