Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning

Reinforcement learning (RL) has proven its potential in complex decision-making tasks.Yet, many RL systems rely on manually crafted state representations, requiring effort in feature engineering.Visual Reinforcement Learning (VRL) offers a way to address this challenge by enabling agents to learn directly from raw visual input. Nonetheless, VRL continues to face generalization issues, as models often overfit to specific domain features. To tackle this issue, we propose Diffusion Guided Adaptive Augmentation (DGA2), an augmentation method that utilizes Stable Diffusion to enhance domain diversity. We introduce an Adaptive Domain Shift strategy that dynamically adjusts the degree of domain shift according to the agent’s learning progress for effective augmentation with Stable Diffusion.Additionally, we employ saliency as the mask to preserve the semantics of data. Our experiments on the DMControl-GB, Adroit, and Procgen environments demonstrate that DGA2 improves generalization performance compared to existing data augmentation and generalization methods.

Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning

Abstract

Method

Figure 1. Adaptive Domain Shift (ADS) adjusts the extent of domain shift based on episode return changes. If the episode return improves, ADS shifts the learning domain according to the change in episode return. Otherwise, the agent continues learning in the previous domain.

Quantitative Evaluation

Main Results

Table 1. Comparison with other methods in DMControl-GB after training on 500K environment steps. We provide the mean and standard deviation of episode return repeated three times with different random seeds. (·) represents the standard deviation.

Table 2. Comparison with other methods in Procgen after training on 25M environment steps. We provide the mean and standard deviation of episode return trained with three different random seeds. (·) represents the standard deviation.

Table 3. Comparison with other methods in Adroit after training on 500K environment steps. We provide the mean and standard deviation of success rate trained with three different random seeds. (·) represents the standard deviation.

Analysis on Adaptive Domain Shift

Analysis on Sample Efficiency

Qualitative Evaluation

Visualization of Saliency Map

Visualization of Augmented Samples

BibTeX