RL in Robotics: From Simulation to Reality

Motivation

Robots act in complex, uncertain worlds. Traditional control relies on hand-designed models. Deep RL offers:

End-to-end policies from perception → action
Trial-and-error learning
Potential for general-purpose autonomy

But transferring from simulation to physical robots exposes the Sim-to-Real gap.

Sim-to-Real Gap

Definition: For a simulator-trained policy $\pi^s$ evaluated by metric $\psi$:

$$G(\pi) := \psi_s(\pi^s) - \psi_r(\pi^s)$$

This gap captures how much performance drops when moving from simulation (s) to reality (r).

Causes:

Observation gap: noise, partial observations, sensor resolution.
Action gap: latency, discretization vs continuous control.
Transition gap: differences in dynamics ($P_s \neq P_r$).
Reward gap: reward functions may not reflect reality.

➡️ Example: A quadruped robot trained in sim to walk forward may overfit to friction values of the simulator. In reality, even a small mismatch in surface friction can cause slipping and falling.

Simulation vs Reality

Simulation advantages:

Safe, low-cost, scalable data collection
Unlimited training samples
Parallel and fast experiments

Real-world challenges:

Safety during trials
High experimental cost
Slow/limited data collection
Unexpected or dangerous behaviors

Techniques Overview

Observation: Domain Randomization (DR), Domain Adaptation (DA), Sensor Fusion, Foundation Models
Action: Action scaling, delay modeling, uncertainty injection, FM-based planners
Transition: DR, DA, grounding methods, distributionally robust RL, LLM-augmented approaches
Reward: Reward shaping, LLM-based design

Techniques Overview

System Identification

Build precise models of real system dynamics
Calibrate simulator with physical measurements
Improves realism but residual gaps persist

Challenge: even well-calibrated systems drift due to wear, temperature, or sensor misalignment.

➡️ Formula: Suppose simulator uses $m_s$ for mass while reality uses $m_r$. Even small mismatch $(m_s - m_r)$ propagates to torque computation:

$$\tau = I\ddot{\theta} + m g l \sin(\theta)$$

An incorrect $m$ leads to wrong torques, destabilizing control.

Domain Randomization

Randomize visuals (textures, lighting) and physics (masses, friction, damping)
Train policies robust across many worlds

Case study: OpenAI’s dexterous hand manipulation → randomizing object mass, surface friction, and textures enabled real-world cube rotation.

Domain Randomization Concept

⚠️ Excessive randomization destabilizes RL training → use curriculum-based Automatic Domain Randomization (ADR).

Domain Adaptation

Align features from sim (source) and real (target)
Approaches:
- Discrepancy-based: align feature distributions (MMD, CORAL)
- Adversarial-based: domain-invariant encoders
- Reconstruction-based: shared latent representations

➡️ Example: Latent Unified State Representation (LUSR) disentangles domain-general and domain-specific features. Policies trained on domain-general embeddings generalize better.

Domain Adaptation Example

Foundation Models in Sim-to-Real

Observation: Vision-Language Models extract semantic scene graphs, robust to sim/real mismatch
Action: LLMs chain low-level skills (grasp, push, open) into long-horizon plans
Transition: FM-based predictors reduce dynamics mismatch
Reward: Text-to-reward shaping from LLM prompts

➡️ Formula (reward shaping with LLM):

$$r’(s,a) = r(s,a) + \lambda f_{LLM}(s,a)$$

where $f_{LLM}$ is an auxiliary reward suggested by natural language description.

Foundation Model Pipeline

Future Directions

Hybrid transfer: DR + DA, adversarial curricula
Continual and on-robot adaptation
Distributionally robust RL and uncertainty-aware control
FM-centric perception, planning, reward pipelines
New evaluation metrics and safer testbeds
Formal explanations and safety guarantees

Open challenge: Bridge formal guarantees with scalable practice for robust, safe, FM-augmented RL on real robots.

References

Zhao, Wenshuai, Jorge Peña Queralta, and Tomi Westerlund. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. IEEE SSCI, 2020. [@survey1]
Salvato, Erica, Gianfranco Fenu, Eric Medvet, and Felice Andrea Pellegrino. Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in RL. IEEE Access 9 (2021): 153171–153187. [@survey2]
Da, Longchao, Justin Turnau, Thirulogasankar Pranav Kutralingam, Alvaro Velasquez, Paulo Shakarian, and Hua Wei. A survey of sim-to-real methods in RL: Progress, prospects and challenges with foundation models. arXiv:2502.13187, 2025. [@survey3]

Motivation#

Sim-to-Real Gap#

Simulation vs Reality#

Techniques Overview#

System Identification#

Domain Randomization#

Domain Adaptation#

Foundation Models in Sim-to-Real#

Future Directions#

References#

Motivation

Sim-to-Real Gap

Simulation vs Reality

Techniques Overview

System Identification

Domain Randomization

Domain Adaptation

Foundation Models in Sim-to-Real

Future Directions

References