Z-Image Turbo Control Unified V1
This repository hosts the Z-Image Turbo Control Unified V1 model. This is a specialized architecture that unifies the powerful Z-Image Turbo base transformer with ControlNet capabilities into a single, cohesive model. This unified pipeline supports multiple generation modes: Text-to-Image, Image-to-Image, and ControlNet.
Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables Unified GGUF Quantization, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM.
π₯ Installation
To set up the environment, simply install the dependencies:
# Create a virtual environment
python -m venv venv
# Activate your venv
# Upgrade pip
python.exe -m pip install --upgrade pip
# Install requirements
pip install -r requirements.txt
Note: This repository contains a diffusers_local folder with the custom ZImageControlUnifiedPipeline and transformer logic required to run this specific architecture.
π Usage
This repository provides separate, easy-to-use scripts for each generation task. The infer_controlnet.py script is pre-configured to handle all supported ControlNet modes.
Scripts
infer_t2i.py: For Text-to-Image generation.infer_i2i.py: For Image-to-Image generation.infer_controlnet.py: For all ControlNet-guided generation tasks.
To run a ControlNet generation, open infer_controlnet.py and set the control_mode variable to one of the following: "pose", "canny", "depth", "hed", or "mlsd".
Hardware Options
Option 1: Low VRAM (GGUF) - Recommended
Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized GGUF file. To use it, set use_gguf = True in the desired inference script and provide the path to the .gguf file.
Key Features:
- Loads the unified transformer from a single 4-bit or 8-bit quantized file.
- Enables aggressive
group_offloadto fit large models on consumer GPUs.
Option 2: High Precision (Diffusers/BF16)
Use this version if you have ample VRAM (e.g., 24GB+). Set use_gguf = False in the script to load the model using the standard from_pretrained directory structure for full BFloat16 precision.
π οΈ Model Features & Configuration (V1)
- Unified Pipeline: A single pipeline now handles Text-to-Image, Image-to-Image, and ControlNet tasks.
- Multiple Control Conditions: Supports Canny, HED, Depth, Pose, and MLSD.
- Control Strength (
controlnet_conditioning_scale): Adjust this parameter for stronger or weaker control guidance. - Group Offload Ready: The underlying code is updated to ensure
diffusers'group_offloadworks correctly, enabling efficient memory management. - Turbo Model: Optimized for fast generation with a low number of inference steps. The scripts are pre-configured with recommended settings for each mode.
Note: This V1 model does not support inpainting or the controlnet_refiner_conditioning_scale parameter.
ποΈ V1 Examples
Here are some examples generated with this unified V1 pipeline.
| Pose Control | Output |
![]() |
![]() |
| Canny Control | Output |
![]() |
![]() |
| HED Control | Output |
![]() |
![]() |
| Depth Control | Output |
![]() |
![]() |
| MLSD Control | Output |
![]() |
![]() |
| Text-to-Image |
![]() |
| Image-to-Image Input | Output |
![]() |
![]() |
π Repository Structure
./transformer/: Directory for model weights (GGUF or standard).infer_controlnet.py: Script for all ControlNet inference modes.infer_t2i.py: Script for Text-to-Image inference.infer_i2i.py: Script for Image-to-Image inference.diffusers_local/: Custom pipeline and model code.requirements.txt: Python dependencies.
- Downloads last month
- 196
4-bit
8-bit












