🧠 RL Journal Club

Actor-Critic vs. Value-Based: Empirical Trade-offs

Actor-Critic vs. Value-Based: Empirical Trade-offs

Price-based control for constrained contextual bandits.

Learning Safely on a Shoestring: Small-Budget Contextual Bandits with Knapsacks

paper cover

Three Dogmas of Reinforcement Learning

expgen overview animation

ExpGen: Explore to Generalize in Zero-Shot RL