A2XP:
Towards Private Domain Generalization

AIRLab, Kyung Hee University, Republic of Korea
CVPR 2024
fail
Figure 1. Comparison between A2XP (top) and non-private domain generalization method (bottom). Non-private methods have to change the objective network's architecture or, at least, parameters while A2XP imployed visual prompts that keep the objective network private.
Contributions
  1. Inspired by VP, introduce A2XP, which is a novel and simple domain generalization method that protects privacy.
  2. Mathematically analyze the generalization issue as an optimization of a linear combination problem.
  3. Further demonstrate the effectiveness and characteristics of A2XP and its component factors through extensive experiments.
  4. A2XP achieves SOTA over existing non-private domain generalization methods with significantly lower computational resource requirements.

Abstract

Deep Neural Networks (DNNs) have become pivotal in various fields, especially in computer vision, outperforming previous methodologies. A critical challenge in their deployment is the bias inherent in data across different domains, such as image style and environmental conditions, leading to domain gaps. This necessitates techniques for learning general representations from biased training data, known as domain generalization. This paper presents Attend to eXpert Prompts (A2XP), a novel approach for domain generalization that preserves the privacy and integrity of the network architecture. A2XP consists of two phases: Expert Adaptation and Domain Generalization. In the first phase, prompts for each source domain are optimized to guide the model towards the optimal direction. In the second phase, two embedder networks are trained to effectively amalgamate these expert prompts, aiming for an optimal output. Our extensive experiments demonstrate that A2XP achieves state-of-the-art results over existing non-private domain generalization methods. The experimental results validate that the proposed approach not only tackles the domain generalization challenge in DNNs but also offers a privacy-preserving, efficient solution to the broader field of computer vision.

Attend to Expert Prompts

Experimental Results

Target & Source Domain Evaluation

Leave-one-domain-out evaluation and source domain evaluation was conducted to assess the generalizability of A2XP the results of which are detailed in Table 1. See Table 1a. The five baselines were augmented using DART, which is an ensemble learning-based method for domain generalization. A2XP outperformed all other methods in each target domain on both PACS and VLCS datasets. It is important to mention that DART does not ensure the privacy of the objective network. Evaluation on all source domains well performed as much as on the target domain as shown in Table 1b.

fail
(a) Comparison with other methods in the target domain. DART was applied to the baselines for their best performance.
fail
(b) Source domain evaluation on PACS (left) and VLCS (right) datasets.
Table 1. Target domain and source domain evaluations. Target domain evaluation was conducted to compare A2XP with other state-of-the-art methods. Source domain evaluation was conducted to see if it is still effective in the source domains.
Effectiveness of A2XP Module

Ablation study was performed to demonstrate the efficacy of the A2XP module by quantifying its impact on accuracy enhancement with commonly used fine-tuning approaches such as linear probing and full tuning. As shown in Table 2, tuning the hidden layers appeared to impact the tuning of the output layer negatively. With the integration of A2XP in linear probing, accuracy was significantly increased across all tested domains. However, in the case of full tuning, the inclusion of A2XP was counterproductive. We analyzed that full tuning is inherently unstable; thus, the A2XP module, positioned before the hidden layers, was adversely affected.

fail
Table 2. Comparison of tuning range on the objective network with and without A2XP. FT and LP refer to Full Tuning and Linear Probing, respectively.
Harmonized Domain Gathering

The manifold space of the features extracted from the last hidden layer was visualized. As shown in Figure 5, to observe how the classes and domains are represented in a 2-dimensional space. Figure 5a-5d shows generalized features are mapped similarly regardless of the target domain.

fail
Figure 5. t-SNE visualization of correctly classified samples in manifold space. (a)-(d) illustrate the representation achieved through generalization, with Picture, Art Painting, Cartoon, and Sketch as the target domains. (e) depicts the representation of expert adaptation prior to the generalization process.
Activation Gain & Loss

The activation maps were visualized to help understand the effects of A2XP on the neural network's focus. See Figure 6. It shows that linear probing without A2XP yields reasonably effective activation maps, and the incorporation of A2XP further refines and improves these activation maps. This means the experts are mixed in different ratios dependent on the target image.

fail
Figure 6. Activation visualization of A2XP using Grad-CAM. (a) shows the input image, (c) and (d) show the relative gain and loss of activation using A2XP prompts in (b), respectively.

BibTeX

@InProceedings{Yu_2024_CVPR,
    author    = {Yu, Geunhyeok and Hwang, Hyoseok},
    title     = {A2XP: Towards Private Domain Generalization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {23544-23553}
}