RL Benchmarking

https://github.com/lucasdino/rl-mujoco-benchmarking

<aside> 💡

What did I do? I built a reinforcement learning engine from scratch to be clean and hackable. The goal? Implement lots of RL algorithms from baseline to SoTA, ablate results, and get cool demo videos.

BTW you can literally start training your own in minutes.

</aside>

Final policy trained using Rainbow DQN on Breakout for 10M steps. Original speed.

Training score for Rainbow DQN on Breakout over 10M steps.

Motivation

The first project I ever did when I started studying ML involved building my own racing game and training an agent to play it using DDQN. See here.

That said, I’ve since come a long way! But while I have a decent understanding of the underlying theory, I hadn’t really implemented many of the SoTA RL algorithms from scratch.

i get what you may be asking. atari environments? how useful can these actually be?

I actually learned a lot from doing this. Having spent a lot of available hours over the last 9 months working on applying RL to LLMs (paper) left me wanting to rebuild some intuition for RL in simpler environments.

And I felt this definitely helped build that intuition. And it was fun.

Rainbow DQN on Breakout. Original speed. Frames are grayed out upon all lives being depleted.

Feature Log

Currently ongoing as a side-project but will update the repo (+ here) with major additions.

Current Features

Standard gymnasium and Atari environments, all vectorized and supports deterministic seeding
DQN→ Rainbow (DDQN, Prioritized Experience Replay, Dueling Nets, Noisy Nets, N-Step TD, Distributional RL)
Ability to save algorithms, videos, and training results following runs