Update README.md

a9d352c verified 3 months ago

4.2 kB

	---
	license: apache-2.0
	base_model:
	- stabilityai/stable-diffusion-3.5-large
	base_model_relation: quantized
	pipeline_tag: text-to-image
	---


	# Elastic model: Fastest self-serving models. Stable Diffusion 3.5 Large.

	Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:

	* __XL__: Mathematically equivalent neural network, optimized with our DNN compiler.

	* __S__: The fastest model, with accuracy degradation less than 2%.


	__Goals of Elastic Models:__

	* Provide the fastest models and service for self-hosting.
	* Provide flexibility in cost vs quality selection for inference.
	* Provide clear quality and latency benchmarks.
	* Provide interface of HF libraries: transformers and diffusers with a single line of code.
	* Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT.

	> It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well.

	![SD3.5_comparison_collage](https://huggingface.co/proxy/cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/8BuRpHPCyUMwJ4O8kXkUr.png)

	![SD3.5_comparison](https://huggingface.co/proxy/cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/3gsiQTdK7Zy6--coknbJJ.png)

	-----

	## Inference

	Currently, our demo model supports 512x512 - 1024x1024 and batch sizes 1-4. This will be updated in the near future.
	To infer our models, you just need to replace `diffusers` import with `elastic_models.diffusers`:

	```python
	import torch
	from elastic_models.diffusers import StableDiffusion3Pipeline

	model_name = 'stabilityai/stable-diffusion-3.5-large'
	hf_token = ''
	device = torch.device("cuda")

	pipeline = StableDiffusion3Pipeline.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	token=hf_token,
	mode='S'
	)
	pipeline.to(device)

	prompts = ["A cat holding a sign that says hello world"]
	output = pipeline(prompt=prompts)

	for prompt, output_image in zip(prompts, output.images):
	output_image.save((prompt.replace(' ', '_') + '.png'))
	```

	### Installation


	__System requirements:__
	* GPUs: H100, B200
	* CPU: AMD, Intel
	* Python: 3.10-3.12


	To work with our models just run these lines in your terminal:

	```shell
	pip install thestage
	pip install 'thestage-elastic-models[nvidia]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple

	# or for blackwell support
	pip install 'thestage-elastic-models[blackwell]' --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple
	pip install -U --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
	pip install -U --pre torchvision --index-url https://download.pytorch.org/whl/nightly/cu128


	pip install flash_attn==2.7.3 --no-build-isolation
	pip uninstall apex
	```

	Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:

	```shell
	thestage config set --api-token <YOUR_API_TOKEN>
	```

	Congrats, now you can use accelerated models!

	----

	## Benchmarks

	Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms.

	### Quality benchmarks

	For quality evaluation we have used: PSNR and SSIM. PSNR and SSIM were computed using outputs of original model.
	\| Metric/Model \| S \| XL \| Original \|
	\|---------------\|---\|----\|----------\|
	\| PSNR \| 20.78 \| 29.13 \| inf \|
	\| SSIM \| 0.81 \| 0.95 \| 1.0 \|


	### Latency benchmarks

	Time in seconds to generate one image 1024x1024
	\| GPU/Model \| S \| XL \| Original \|
	\|-----------\|-----\|----\|----------\|
	\| H100 \| 3.10 \| 3.80 \| 6.55 \|
	\| B200 \| 1.76 \| 2.27 \| 4.81 \|


	## Links

	* __Platform__: [app.thestage.ai](https://app.thestage.ai)
	<!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) -->
	* __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
	* __Contact email__: [email protected]