Are there any current methods to speed up inference?

by zhouchongqin - opened Dec 11, 2025

Discussion

zhouchongqin

Dec 11, 2025

At present, the model doesn't support vllm. Are there any current methods to speed up inference?

Zhuoning

Alibaba-NLP org Dec 12, 2025

Thanks for your interest. However, GVE does not support any acceleration framework. You can still use Transformers and try parallel inference on multiple GPUs.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment