Z-Image Turbo Control Unified V1

This repository hosts the Z-Image Turbo Control Unified V1 model. This is a specialized architecture that unifies the powerful Z-Image Turbo base transformer with ControlNet capabilities into a single, cohesive model. This unified pipeline supports multiple generation modes: Text-to-Image, Image-to-Image, and ControlNet.

Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables Unified GGUF Quantization, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM.

📥 Installation

To set up the environment, simply install the dependencies:

# Create a virtual environment
python -m venv venv

# Activate your venv

# Upgrade pip
python.exe -m pip install --upgrade pip

# Install requirements
pip install -r requirements.txt

Note: This repository contains a diffusers_local folder with the custom ZImageControlUnifiedPipeline and transformer logic required to run this specific architecture.

🚀 Usage

This repository provides separate, easy-to-use scripts for each generation task. The infer_controlnet.py script is pre-configured to handle all supported ControlNet modes.

Scripts

infer_t2i.py: For Text-to-Image generation.
infer_i2i.py: For Image-to-Image generation.
infer_controlnet.py: For all ControlNet-guided generation tasks.

To run a ControlNet generation, open infer_controlnet.py and set the control_mode variable to one of the following: "pose", "canny", "depth", "hed", or "mlsd".

Hardware Options

Option 1: Low VRAM (GGUF) - Recommended

Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized GGUF file. To use it, set use_gguf = True in the desired inference script and provide the path to the .gguf file.

Key Features:

Loads the unified transformer from a single 4-bit or 8-bit quantized file.
Enables aggressive group_offload to fit large models on consumer GPUs.

Option 2: High Precision (Diffusers/BF16)

Use this version if you have ample VRAM (e.g., 24GB+). Set use_gguf = False in the script to load the model using the standard from_pretrained directory structure for full BFloat16 precision.

🛠️ Model Features & Configuration (V1)

Unified Pipeline: A single pipeline now handles Text-to-Image, Image-to-Image, and ControlNet tasks.
Multiple Control Conditions: Supports Canny, HED, Depth, Pose, and MLSD.
Control Strength (controlnet_conditioning_scale): Adjust this parameter for stronger or weaker control guidance.
Group Offload Ready: The underlying code is updated to ensure diffusers' group_offload works correctly, enabling efficient memory management.
Turbo Model: Optimized for fast generation with a low number of inference steps. The scripts are pre-configured with recommended settings for each mode.

Note: This V1 model does not support inpainting or the controlnet_refiner_conditioning_scale parameter.

🏞️ V1 Examples

Here are some examples generated with this unified V1 pipeline.

Pose Control	Output

Canny Control	Output

HED Control	Output

Depth Control	Output

MLSD Control	Output

Text-to-Image

Image-to-Image Input	Output

📂 Repository Structure

./transformer/: Directory for model weights (GGUF or standard).
infer_controlnet.py: Script for all ControlNet inference modes.
infer_t2i.py: Script for Text-to-Image inference.
infer_i2i.py: Script for Image-to-Image inference.
diffusers_local/: Custom pipeline and model code.
requirements.txt: Python dependencies.

Downloads last month: 196

GGUF

Hardware compatibility

4-bit

8-bit