Z-Image Turbo Control Unified V1

This repository hosts the Z-Image Turbo Control Unified V1 model. This is a specialized architecture that unifies the powerful Z-Image Turbo base transformer with ControlNet capabilities into a single, cohesive model. This unified pipeline supports multiple generation modes: Text-to-Image, Image-to-Image, and ControlNet.

Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables Unified GGUF Quantization, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM.

πŸ“₯ Installation

To set up the environment, simply install the dependencies:

# Create a virtual environment
python -m venv venv

# Activate your venv

# Upgrade pip
python.exe -m pip install --upgrade pip

# Install requirements
pip install -r requirements.txt

Note: This repository contains a diffusers_local folder with the custom ZImageControlUnifiedPipeline and transformer logic required to run this specific architecture.

πŸš€ Usage

This repository provides separate, easy-to-use scripts for each generation task. The infer_controlnet.py script is pre-configured to handle all supported ControlNet modes.

Scripts

  • infer_t2i.py: For Text-to-Image generation.
  • infer_i2i.py: For Image-to-Image generation.
  • infer_controlnet.py: For all ControlNet-guided generation tasks.

To run a ControlNet generation, open infer_controlnet.py and set the control_mode variable to one of the following: "pose", "canny", "depth", "hed", or "mlsd".

Hardware Options

Option 1: Low VRAM (GGUF) - Recommended

Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized GGUF file. To use it, set use_gguf = True in the desired inference script and provide the path to the .gguf file.

Key Features:

  • Loads the unified transformer from a single 4-bit or 8-bit quantized file.
  • Enables aggressive group_offload to fit large models on consumer GPUs.

Option 2: High Precision (Diffusers/BF16)

Use this version if you have ample VRAM (e.g., 24GB+). Set use_gguf = False in the script to load the model using the standard from_pretrained directory structure for full BFloat16 precision.

πŸ› οΈ Model Features & Configuration (V1)

  • Unified Pipeline: A single pipeline now handles Text-to-Image, Image-to-Image, and ControlNet tasks.
  • Multiple Control Conditions: Supports Canny, HED, Depth, Pose, and MLSD.
  • Control Strength (controlnet_conditioning_scale): Adjust this parameter for stronger or weaker control guidance.
  • Group Offload Ready: The underlying code is updated to ensure diffusers' group_offload works correctly, enabling efficient memory management.
  • Turbo Model: Optimized for fast generation with a low number of inference steps. The scripts are pre-configured with recommended settings for each mode.

Note: This V1 model does not support inpainting or the controlnet_refiner_conditioning_scale parameter.

🏞️ V1 Examples

Here are some examples generated with this unified V1 pipeline.

Pose Control Output
Canny Control Output
HED Control Output
Depth Control Output
MLSD Control Output
Text-to-Image
Image-to-Image Input Output

πŸ“‚ Repository Structure

  • ./transformer/: Directory for model weights (GGUF or standard).
  • infer_controlnet.py: Script for all ControlNet inference modes.
  • infer_t2i.py: Script for Text-to-Image inference.
  • infer_i2i.py: Script for Image-to-Image inference.
  • diffusers_local/: Custom pipeline and model code.
  • requirements.txt: Python dependencies.
Downloads last month
196
GGUF
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support