wrong param name when using torch.load()

by BwShen - opened Aug 18, 2023

Aug 18, 2023

•

edited Aug 18, 2023

Sometimes I have to use torch.load() to load model params without huggingface package. However, the param names are not the desired ones, e.g., h.23.mlp.dense_4h_to_h.weight which should be transformer.h.23.mlp.dense_4h_to_h.weight, and lm_head.weight does not exist.
I guess it is related to #5 and #6 where the model architecture is changed, but the params are still BloomModel instead of BloomForCausalLM

Muennighoff

BigScience Workshop org Aug 18, 2023

Thanks for noting 🧐 Maybe @lewtun knows what the problem is? Should we change it back to BloomForCausalLM?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment