Hey @navissivan and @ksoky,
- Don’t worry about the
use_cachewarning, it just means that we cannot use thek,vcache for the attention mechanism with gradient checkpointing. If you want to disable the warning, load the model and then setuse_cacheto False:
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.config.use_cache = False
The operation of the model is the same with and without cache - we just use cache to speed up decoding. Cache isn’t compatible when we use gradient checkpointing, so it’s disabled by the Trainer and a warning shown instead.
-
It shouldn’t stay idle for that long - usually this happens when we set
group_by_length=Truebut haven’t specifiedinput_lengthsin outprepare_datasetfunction. Have you modified theprepare_datasetfunction? Could you make sure the dataset that you pass to trainer has theinput_lengthscolumn? -
A progress bar should show - you need to set
disable_tqdm=Falsein your training args.
You have a couple of options for running it in the background:
- tmux: call tmux and then run Jupyter notebooks from the tmux shell:
tmux new -s mysession
jupyter lab
Then run your shell as normal. The process will continue running even when you close your shell. When you re-open your shell, you can reattach through:
tmux a -t mysession
Check out the docs for more info.
- The other option is to export the ipynb notebook as a python script, and then run it using tmux or nohup:
From File → Export Notebook As… in the Jupyter Lab menu select ‘Export Notebook to Executable Script’. This will give you a Python script to download. Then run it using tmux (as above) or nohup:
nohup python fine-tuning-whisper.py
You can open a new window to view the output:
vim nohup.out
-
The table generates automatically by the Trainer if you perform evaluation over the course of training.
-
It’s possible. The model checkpoint saved at step 1000 saves in the output directory under
/home/sivan/whisper_base_fl_ch/checkpoint-1000
You can load a model checkpoint from the saved checkpoint at step 1000 as follows:
model = WhisperForConditionalGeneration.from_pretrained("/home/sivan/whisper_base_fl_ch/checkpoint-1000")
You can then run a validation step:
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
training_args = Seq2SeqTrainingArguments(
output_dir="/home/sivan/whisper_base_fl_ch/validation_step",
do_train=False,
do_eval=True,
per_device_eval_batch_size=8,
predict_with_generate=True,
generation_max_length=225,
save_strategy="no",
report_to=["tensorboard"],
push_to_hub=False,
disable_tqdm=False,
)
trainer = Seq2SeqTrainer(
args=training_args,
model=model,
eval_dataset=fleurs_ch["validation"], # set to your val set
data_collator=data_collator,
compute_metrics=compute_metrics,
tokenizer=processor.feature_extractor,
)
trainer.evaluate()
You can then repeat this for the checkpoints in directories checkpoint-2000, checkpoint-3000 and so on.