codemichaeld commited on
Commit
dbabf22
·
verified ·
1 Parent(s): 6e18975

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: diffusers
3
+ tags:
4
+ - fp8
5
+ - safetensors
6
+ - precision-recovery
7
+ - mixed-method
8
+ - converted-by-gradio
9
+ ---
10
+ # FP8 Model with Per-Tensor Precision Recovery
11
+ - **Source**: `https://huggingface.co/MochunniaN1/One-to-All-1.3b_2`
12
+ - **Original File(s)**: `2 sharded files`
13
+ - **Original Format**: `safetensors`
14
+ - **FP8 Format**: `E5M2`
15
+ - **FP8 File**: `model-00001-of-00002-fp8-e5m2.safetensors`
16
+ - **Recovery File**: `model-00001-of-00002-recovery.safetensors`
17
+ ## Recovery Rules Used
18
+ ```json
19
+ [
20
+ {
21
+ "key_pattern": "vae",
22
+ "dim": 4,
23
+ "method": "diff"
24
+ },
25
+ {
26
+ "key_pattern": "encoder",
27
+ "dim": 4,
28
+ "method": "diff"
29
+ },
30
+ {
31
+ "key_pattern": "decoder",
32
+ "dim": 4,
33
+ "method": "diff"
34
+ },
35
+ {
36
+ "key_pattern": "text",
37
+ "dim": 2,
38
+ "min_size": 10000,
39
+ "method": "lora",
40
+ "rank": 64
41
+ },
42
+ {
43
+ "key_pattern": "emb",
44
+ "dim": 2,
45
+ "min_size": 10000,
46
+ "method": "lora",
47
+ "rank": 64
48
+ },
49
+ {
50
+ "key_pattern": "attn",
51
+ "dim": 2,
52
+ "min_size": 10000,
53
+ "method": "lora",
54
+ "rank": 128
55
+ },
56
+ {
57
+ "key_pattern": "conv",
58
+ "dim": 4,
59
+ "method": "diff"
60
+ },
61
+ {
62
+ "key_pattern": "resnet",
63
+ "dim": 4,
64
+ "method": "diff"
65
+ },
66
+ {
67
+ "key_pattern": "all",
68
+ "method": "none"
69
+ }
70
+ ]
71
+ ```
72
+ ## Usage (Inference)
73
+ ```python
74
+ from safetensors.torch import load_file
75
+ import torch
76
+ # Load FP8 model
77
+ fp8_state = load_file("model-00001-of-00002-fp8-e5m2.safetensors")
78
+ # Load recovery weights if available
79
+ recovery_state = load_file("model-00001-of-00002-recovery.safetensors") if "model-00001-of-00002-recovery.safetensors" and os.path.exists("model-00001-of-00002-recovery.safetensors") else {}
80
+ # Reconstruct high-precision weights
81
+ reconstructed = {}
82
+ for key in fp8_state:
83
+ fp8_weight = fp8_state[key].to(torch.float32) # Convert to float32 for computation
84
+
85
+ # Apply LoRA recovery if available
86
+ lora_a_key = f"lora_A.{key}"
87
+ lora_b_key = f"lora_B.{key}"
88
+ if lora_a_key in recovery_state and lora_b_key in recovery_state:
89
+ A = recovery_state[lora_a_key].to(torch.float32)
90
+ B = recovery_state[lora_b_key].to(torch.float32)
91
+ # Reconstruct the low-rank approximation
92
+ lora_weight = B @ A
93
+ fp8_weight = fp8_weight + lora_weight
94
+
95
+ # Apply difference recovery if available
96
+ diff_key = f"diff.{key}"
97
+ if diff_key in recovery_state:
98
+ diff = recovery_state[diff_key].to(torch.float32)
99
+ fp8_weight = fp8_weight + diff
100
+
101
+ reconstructed[key] = fp8_weight
102
+ # Use reconstructed weights in your model
103
+ model.load_state_dict(reconstructed)
104
+ ```
105
+ > **Note**: For best results, use the same recovery configuration during inference as was used during extraction.
106
+ > Requires PyTorch ≥ 2.1 for FP8 support.
107
+ ## Statistics
108
+ - **Total layers**: 1329
109
+ - **Layers with recovery**: 380
110
+ - LoRA recovery: 372
111
+ - Difference recovery: 8