Deep Neural Networks (DNNs) have become pivotal in various fields, especially in computer vision, outperforming previous methodologies. A critical challenge in their deployment is the bias inherent in data across different domains, such as image style and environmental conditions, leading to domain gaps. This necessitates techniques for learning general representations from biased training data, known as domain generalization. This paper presents Attend to eXpert Prompts (A2XP), a novel approach for domain generalization that preserves the privacy and integrity of the network architecture. A2XP consists of two phases: Expert Adaptation and Domain Generalization. In the first phase, prompts for each source domain are optimized to guide the model towards the optimal direction. In the second phase, two embedder networks are trained to effectively amalgamate these expert prompts, aiming for an optimal output. Our extensive experiments demonstrate that A2XP achieves state-of-the-art results over existing non-private domain generalization methods. The experimental results validate that the proposed approach not only tackles the domain generalization challenge in DNNs but also offers a privacy-preserving, efficient solution to the broader field of computer vision.
Leave-one-domain-out evaluation and source domain evaluation was conducted to assess the generalizability of A2XP the results of which are detailed in Table 1. See Table 1a. The five baselines were augmented using DART, which is an ensemble learning-based method for domain generalization. A2XP outperformed all other methods in each target domain on both PACS and VLCS datasets. It is important to mention that DART does not ensure the privacy of the objective network. Evaluation on all source domains well performed as much as on the target domain as shown in Table 1b.
Ablation study was performed to demonstrate the efficacy of the A2XP module by quantifying its impact on accuracy enhancement with commonly used fine-tuning approaches such as linear probing and full tuning. As shown in Table 2, tuning the hidden layers appeared to impact the tuning of the output layer negatively. With the integration of A2XP in linear probing, accuracy was significantly increased across all tested domains. However, in the case of full tuning, the inclusion of A2XP was counterproductive. We analyzed that full tuning is inherently unstable; thus, the A2XP module, positioned before the hidden layers, was adversely affected.
The manifold space of the features extracted from the last hidden layer was visualized. As shown in Figure 5, to observe how the classes and domains are represented in a 2-dimensional space. Figure 5a-5d shows generalized features are mapped similarly regardless of the target domain.
The activation maps were visualized to help understand the effects of A2XP on the neural network's focus. See Figure 6. It shows that linear probing without A2XP yields reasonably effective activation maps, and the incorporation of A2XP further refines and improves these activation maps. This means the experts are mixed in different ratios dependent on the target image.
@InProceedings{Yu_2024_CVPR,
author = {Yu, Geunhyeok and Hwang, Hyoseok},
title = {A2XP: Towards Private Domain Generalization},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {23544-23553}
}