Inference-time Alignment with Nudging.

By injecting a few nudging tokens at inference time, we can make base models able to follow user instructions helpfully and safely.

Our demo is powered by the Together AI API. However, since only three base models are currently still available in the serverless API, we only choose three base models and nudging models for demonstration.
- [Update] Unfortunately, Together AI has stopped serving most base models and many small instruct models. The current demo only supports nudging LLama-2-70B with Mistral-7B-v0.1-Instruct. Still, you can run nudging locally with any model pairs using our code.
The daily limit is 50 requests per IP address. If you need more, please contact us.
This demo uses an API-based implementation of the nudging, which can be slow due to multiple API calls for each question. With a proper speculative decoding type implementation, the inference speed of nudging can be significantly improved.