Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning

Jeong Woon Lee and Hyoseok Hwang
{everyman, hyoseok}@khu.ac.uk

Abstract

Reinforcement learning (RL) has proven its potential in complex decision-making tasks.Yet, many RL systems rely on manually crafted state representations, requiring effort in feature engineering.Visual Reinforcement Learning (VRL) offers a way to address this challenge by enabling agents to learn directly from raw visual input. Nonetheless, VRL continues to face generalization issues, as models often overfit to specific domain features. To tackle this issue, we propose Diffusion Guided Adaptive Augmentation (DGA2), an augmentation method that utilizes Stable Diffusion to enhance domain diversity. We introduce an Adaptive Domain Shift strategy that dynamically adjusts the degree of domain shift according to the agent’s learning progress for effective augmentation with Stable Diffusion.Additionally, we employ saliency as the mask to preserve the semantics of data. Our experiments on the DMControl-GB, Adroit, and Procgen environments demonstrate that DGA2 improves generalization performance compared to existing data augmentation and generalization methods.

Method

Quantitative Evaluation

Main Results

The evaluation was conducted in DMControl-GB, Procgen, and Adroit to assess the effectiveness of DGA2. The results demonstrated that our method boost the generalization capability of baseline on various domain and environments.

Analysis on Adaptive Domain Shift

We evaluated the effectiveness of the Adaptive Domain Shift (ADS) strategy by comparing four methods: training withing an identical domain (ID), domain switching at fixed steps (DS(S)), domain switching upon episode termination (DS(E)), and progressively changing domains at the end of each episode(CD). In Table 4. the results suggested that adaptively adjusting the domain based on episode return was more effective than other strategies.

fail
Table 4. Experimental results on difference domain shift strategies.
Analysis on Sample Efficiency

We compareed test performance across environment steps, as shown in Figure 3. Although DGA2 is activated only after 100K steps, it rapidly outperforms all baseline methods and achieves the highest performance by 200K steps. This indicates that our method requires significantly fewer environment interactions to reach a given performance level.

fail
Figure 3. Test performance over environment steps.

Qualitative Evaluation

Visualization of Saliency Map

We assessed generalization capability using saliency maps in unseed domains. As shown in Figure 4, NoAug often attended to irrelevant backgrounds, failing to identify task-relevant areas in Ball in cup catch (Video Hard). Overlay improved attention but exhibited similar tendencies with NoAug, especially in Walker walk. In contrast, our method consistently focused on task-relevant areas across both Color Hard and Video Hard.

fail
Figure 4. Visualization of saliency map of Walker walk and Ball in cup catch in Color Hard and Video Hard. (a) Original Image, Saliency Map of (b) NoAug, (c) Overlay, and (d) DGA2.
Visualization of Augmented Samples

We visualized the augmented images and selected SRM and Overlay as comparisons to observe pixel-level changes alongside our method. As depicted in Figure 5, our method, similar to Overlay, was capable of generating a diverse of domains. Additionally, by integrating saliency, our approach highlighted and preserved task-relevant regions.

fail
Figure 5. Visualization of augmented images. (a) Original images, and augmented images from (b) SRM, (c) Overlay, and (d) DGA2.