bubbliiiing commited on
Commit
ad20511
Β·
1 Parent(s): fbc087f

Update README

Browse files
Files changed (2) hide show
  1. .gitattributes +2 -0
  2. README.md +39 -9
.gitattributes CHANGED
@@ -1,3 +1,5 @@
 
 
1
  *.7z filter=lfs diff=lfs merge=lfs -text
2
  *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
 
1
+ *.png filter=lfs diff=lfs merge=lfs -text
2
+ *.jpg filter=lfs diff=lfs merge=lfs -text
3
  *.7z filter=lfs diff=lfs merge=lfs -text
4
  *.arrow filter=lfs diff=lfs merge=lfs -text
5
  *.bin filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -3,22 +3,35 @@ license: apache-2.0
3
  library_name: videox_fun
4
  ---
5
 
6
- # Z-Image-Turbo-Fun-Controlnet-Union
7
 
8
- [![Github](https://img.shields.io/badge/🎬%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)
9
 
10
  ## Model Features
11
- - This ControlNet is added on 6 blocks.
12
- - The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
13
  - It supports multiple control conditionsβ€”including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
14
- - You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80.
 
 
 
15
 
16
  ## TODO
17
- - [ ] Train on more data and for more steps.
18
- - [ ] Support inpaint mode.
19
 
20
  ## Results
21
 
 
 
 
 
 
 
 
 
 
 
 
22
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
23
  <tr>
24
  <td>Pose</td>
@@ -98,7 +111,24 @@ Then download the weights into models/Diffusion_Transformer and models/Personali
98
  β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
99
  β”‚ └── πŸ“‚ Z-Image-Turbo/
100
  β”œβ”€β”€ πŸ“‚ Personalized_Model/
101
- β”‚ └── πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Union.safetensors
102
  ```
103
 
104
- Then run the file `examples/z_image_fun/predict_t2i_control.py`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  library_name: videox_fun
4
  ---
5
 
6
+ # Z-Image-Turbo-Fun-Controlnet-Union-2.0
7
 
8
+ [![Github](https://img.shields.io/badge/🎬%20Code-VideoX_Fun-blue)](https://github.com/aigc-apps/VideoX-Fun)
9
 
10
  ## Model Features
11
+ - This ControlNet is added on 15 layer blocks and 2 refiner layer blocks.
12
+ - The model was trained from scratch for 70,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
13
  - It supports multiple control conditionsβ€”including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
14
+ - We found that under different strength levels, using different step numbers has a certain impact on the realism and clarity of the results. For strength and step testing, please refer to [Scale Test Results](#scale-test-results).
15
+ - You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.90.
16
+ - **Note on Steps: As you increase the control strength (higher control_context_scale values), it's recommended to appropriately increase the number of inference steps to achieve better results and maintain generation quality. This is likely because the control model has not been distilled.**
17
+ - Inpainting mode is also supported.
18
 
19
  ## TODO
20
+ - [ ] Train on better data.
 
21
 
22
  ## Results
23
 
24
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
25
+ <tr>
26
+ <td>Pose + Inpaint</td>
27
+ <td>Output</td>
28
+ </tr>
29
+ <tr>
30
+ <td><img src="asset/inpaint.jpg" width="100%" /><img src="asset/mask.jpg" width="100%" /></td>
31
+ <td><img src="results/pose_inpaint.png" width="100%" /></td>
32
+ </tr>
33
+ </table>
34
+
35
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
36
  <tr>
37
  <td>Pose</td>
 
111
  β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
112
  β”‚ └── πŸ“‚ Z-Image-Turbo/
113
  β”œβ”€β”€ πŸ“‚ Personalized_Model/
114
+ β”‚ └── πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Union-2.0.safetensors
115
  ```
116
 
117
+ Then run the file `examples/z_image_fun/predict_t2i_control_2.0.py` and `examples/z_image_fun/predict_i2i_inpaint_2.0.py`.
118
+
119
+ ## Scale Test Results
120
+
121
+ The table below shows the generation results under different combinations of Diffusion steps and Control Scale strength:
122
+
123
+ | Diffusion Steps | Scale 0.65 | Scale 0.70 | Scale 0.75 | Scale 0.8 | Scale 0.9 | Scale 1.0 |
124
+ |:---------------:|:----------:|:----------:|:----------:|:---------:|:---------:|:---------:|
125
+ | **9** | ![](results/scale_test/9_scale_0.65.png) | ![](results/scale_test/9_scale_0.70.png) | ![](results/scale_test/9_scale_0.75.png) | ![](results/scale_test/9_scale_0.8.png) | ![](results/scale_test/9_scale_0.9.png) | ![](results/scale_test/9_scale_1.0.png) |
126
+ | **10** | ![](results/scale_test/10_scale_0.65.png) | ![](results/scale_test/10_scale_0.70.png) | ![](results/scale_test/10_scale_0.75.png) | ![](results/scale_test/10_scale_0.8.png) | ![](results/scale_test/10_scale_0.9.png) | ![](results/scale_test/10_scale_1.0.png) |
127
+ | **20** | ![](results/scale_test/20_scale_0.65.png) | ![](results/scale_test/20_scale_0.70.png) | ![](results/scale_test/20_scale_0.75.png) | ![](results/scale_test/20_scale_0.8.png) | ![](results/scale_test/20_scale_0.9.png) | ![](results/scale_test/20_scale_1.0.png) |
128
+ | **30** | ![](results/scale_test/30_scale_0.65.png) | ![](results/scale_test/30_scale_0.70.png) | ![](results/scale_test/30_scale_0.75.png) | ![](results/scale_test/30_scale_0.8.png) | ![](results/scale_test/30_scale_0.9.png) | ![](results/scale_test/30_scale_1.0.png) |
129
+ | **40** | ![](results/scale_test/40_scale_0.65.png) | ![](results/scale_test/40_scale_0.70.png) | ![](results/scale_test/40_scale_0.75.png) | ![](results/scale_test/40_scale_0.8.png) | ![](results/scale_test/40_scale_0.9.png) | ![](results/scale_test/40_scale_1.0.png) |
130
+
131
+ Parameter Description:
132
+
133
+ Diffusion Steps: Number of iteration steps for the diffusion model (9, 10, 20, 30, 40)
134
+ Control Scale: Control strength coefficient (0.65 - 1.0)