Papers
arxiv:2406.06564

SwitchLoRA: Switched Low-Rank Adaptation Can Learn Full-Rank Information

Published on Jun 3, 2024
Authors:
,
,

Abstract

SwitchLoRA, a parameter-efficient training technique, enhances pre-training accuracy and reduces computational overhead by incrementally updating trainable parameters in LoRA adapters.

AI-generated summary

In the training of large language models, parameter-efficient techniques such as LoRA optimize memory usage and reduce communication overhead and memory usage during the fine-tuning phase. However, applying such techniques directly during the pre-training phase results in poor performance, primarily because the premature implementation of low-rank training significantly reduces model accuracy. Existing methods like ReLoRA and GaLore have attempted to address this challenge by updating the low-rank subspace. However, they still fall short of achieving the accuracy of full-rank training. Specifically, ReLoRA restricts the frequency of updates to preserve optimizer states consistency, hindering its ability to closely approximate full-rank training behavior. Meanwhile, GaLore relies on Singular Value Decomposition (SVD) to approximate the full-rank space, which introduces accuracy loss during the approximation process. In this paper, we introduce SwitchLoRA, a parameter-efficient training technique that frequently and smoothly replaces the trainable parameters of LoRA adapters with alternative parameters. SwitchLoRA updates the low-rank subspace incrementally, targeting only a few dimensions at a time to minimize the impact on optimizer states. This allows a higher update frequency, thereby enhancing accuracy by enabling the updated parameters to more closely mimic full-rank behavior during the pre-training phase. Our results demonstrate that SwitchLoRA actually surpasses full-rank training, reducing perplexity from 15.23 to 15.01 on the LLaMA 1.3B model, while also cutting communication overhead by 54\% and memory usage by 13\%. Furthermore, after full fine-tuning the SwitchLoRA pre-trained model and the full-rank pre-trained model on the GLUE benchmark, the SwitchLoRA pre-trained model showed an average accuracy gain of about 1\% over the full-rank pre-trained model.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.06564 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.06564 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.06564 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.