Prompt-based Gradient Projection
To achieve Theorem 1, i.e., the condition of anti-forgetting, the new prompts require to be:
This is a unified work about how to resist forgetting in various efficient-parameter based continual learning and the corresponding paper will release soon.
Here, we create this website to propagate our work (2024/3/3).
Our previous work 'Prompt Gradient Projection for Continual Learning, ICLR, 2024' has been received by ICLR24 SpotLight!💐💐
The latest update is in 2024/3/3.
Parameter-efficient tuning (PET) has demonstrated impressive performance in continual learning by adding scalable extra parameters that are independent of the encoder. Based on tiny trainable fine-tuning parameters and a frozen pre-trained encoder, the comprehensive performance is highly improved but remains under-explored due to the novel forgetting mechanism.
However, recent progress mainly focused on designing efficient fine-tuning paradigms, while ignoring the mechanism of forgetting generation in the PET continual learning, let alone anti-forgetting criteria. Moreover, the unresolved trade-off between learning new information and protecting old knowledge further exacerbates these challenges.
This paper presents Efficient Parameter Gradient Projection (EPGP), combining various PET paradigms with orthogonal gradient projection, and theoretically deducing that the orthogonal condition for gradient can effectively resist forgetting in continual learning, which is applicable to all PET continual learning methods.
Uniquely, EPGP is the first unified method to provide anti-forgetting mechanism with mathematical demonstration for different tuning paradigms. Additionally, by conducting Singular Value Decomposition (SVD) to obtain gradient projection matrix, EPGP is proved as the optimal solution to balance the trade-off between plasticity and stability in PET continual learning methods.
We extensively evaluate our method with different backbones on diverse datasets and experiments demonstrate the efficiency of reducing forgetting both in class incremental, online class incremental, domain incremental, task incremental settings for uni-model, cross-model and instruction incremental learning for cross-model.
Visualization of distinct parameter-efficient tuning paradigms
For various parameter efficient tuning paradigms, to better preserve old knowledge, we propose that the update of network would satisfy the following theorems.
Prompt-Tuning:
Theorem 1.
Prefix-Tuning:
Theorem 2.
where
Adapter-Tuning:
Theorem 3.
LoRA-Tuning:
Theorem 4.
To achieve Theorem 1, i.e., the condition of anti-forgetting, the new prompts require to be:
To achieve Theorem 2, i.e., the condition of anti-forgetting, the new prompts require to be:
To achieve Theorem 3, i.e., the condition of anti-forgetting, the new prompts require to be:
To achieve Theorem 4, i.e., the condition of anti-forgetting, the new prompts require to be:
T-SNE results of prompt and prompt-pg on 10-Split-CIFAR100 dataset with ViT backbone. The left column represents prompt, and the right column represents prompt-gp. The red circle means the drawback existing in prompt, and the blue circle shows the improvement of our method.
OnlineClass incremental learning results of prefix/prompt tuning paradigms with ViT backbone.
@article{qiao2024prompt,
author = {Jingyang Qiao & Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yong Peng, Yuan Xie},
title = {Prompt Gradient Projection for Continual Learning},
journal = {The Twelfth International Conference on Learning Representations},
year = {2024},
}