GPT-NeoX now supports preference learning (SFT, DPO, KTO)! For more information on this joint effort between EleutherAI and SynthLabs, view our associated blog posts:

SynthLabs: https://www.synthlabs.ai/blog/rlhf-and-rlaif-in-gpt-neox

EleutherAI: https://www.eleuther.ai/rlhf-and-rlaif-in-gpt-neox

This is a direct preference optimization (DPO) model produced by:

  1. Taking the ultrachat SFT checkpoint from https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta
  2. Loading the model into the GPT-NeoX library and running DPO on the Zephyr 7B recipe (ultrafeedback). Example usage for running post-training in GPT-NeoX is at: https://github.com/EleutherAI/gpt-neox/tree/main/post-training
Model gsm8k 5-shot flexible-extract MMLU 5-shot acc ARC Challenge 25-shot acc_norm HellaSwag 10-shot acc_norm Winogrande 5-shot acc TruthfulQA mc2 0-shot acc
NeoX DPO from Zephyr-SFT 64.1 41.8 60 63.2 85.2 79.2
Zephyr-7b-Beta 62.5 34.3 59.8 63.6 84.4 77.6
Downloads last month
40
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support