TongZheng1999/iter_1_reinforce_baseline_per_sample_200epoch_strong_init_step_150_ Updated about 8 hours ago
TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1200 4B • Updated 24 days ago • 24
TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1000 4B • Updated 24 days ago • 26
TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-No-Filter-step300 4B • Updated 25 days ago • 17
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-5 Text Generation • 196k • Updated Nov 20, 2025 • 5
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-4 Text Generation • 196k • Updated Nov 20, 2025 • 2
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-3 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-2 Text Generation • 196k • Updated Nov 20, 2025 • 3
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-1 Text Generation • 196k • Updated Nov 20, 2025 • 2
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-5 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-4 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-3 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-2 Text Generation • 196k • Updated Nov 20, 2025
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-1 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_10-2-3Rounds-iter-3 Text Generation • 196k • Updated Nov 19, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_10-2-3Rounds-iter-2 Text Generation • 196k • Updated Nov 19, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_10-2-3Rounds-iter-1 Text Generation • 196k • Updated Nov 19, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_1-2-3Rounds-iter-2 Text Generation • 196k • Updated Nov 19, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_1-2-3Rounds-iter-1 Text Generation • 196k • Updated Nov 19, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-start-star-mixed_direct-OP-final_v2_1-2-3Rounds-iter-1 Text Generation • 196k • Updated Nov 19, 2025 • 1