Qwen3-8B-Drama-Thinking
This model is a full parameter fine-tuned version of Qwen/Qwen3-8B on a custom drama thinking dataset with explicit creative reasoning chains.
Model Description
- Base Model: Qwen3-8B (8 billion parameters)
- Training Method: Full Parameter Fine-tuning (NOT LoRA)
- Training Framework: ms-swift
- Training Data: Custom Drama Thinking Dataset (6,319 samples, avg ~5,000 tokens)
- Specialization: Screenwriting with explicit
<think>...</think>creative reasoning - Hardware: 2x NVIDIA H100 80GB SXM5
- Training Time: 2 hours 46 minutes (3 epochs)
- Training Cost: ~$17.86
Key Features
๐ฌ Professional Screenwriting Assistant
This model generates dramatic scripts with explicit creative deliberation:
- โ
Thinking Process Visible: Uses
<think>...</think>tags to show internal reasoning - โ Deep Character Psychology: Analyzes motivations, defense mechanisms, subtext
- โ Structural Planning: Three-act structure, emotional arcs, pacing decisions
- โ Visual Storytelling: Symbolism, atmosphere, cinematographic choices
- โ Professional Format: Correct screenplay formatting (scene headers, action lines, dialogue)
๐ Performance Comparison
Compared to base Qwen3-8B:
| Metric | Base Model | Fine-Tuned | Improvement |
|---|---|---|---|
| Output Length | 1,071 tokens | 3,874 tokens | +262% |
| Thinking Depth | 5/10 | 9/10 | +80% |
| Creative Reasoning | 500 tokens | 3,400 tokens | +580% |
| Craft Analysis | Generic | Professional | Qualitative leap |
๐ฏ Unique Value Proposition
This is not just a text generator - it's a creative thinking partner that externalizes the entire screenwriting process: from title analysis to character psychology to structural planning to final execution.
Training Details
Training Configuration
Model: Qwen/Qwen3-8B
Template: qwen3_thinking
Training Type: Full Parameter (all 8B parameters)
Max Length: 8192 tokens (for long thinking chains)
Batch Size: 1 per device ร 2 GPUs
Gradient Accum: 8 steps (effective batch size: 16)
Learning Rate: 1e-5
Epochs: 3
Optimization: DeepSpeed Zero3 + Gradient Checkpointing
Liger Kernel, BF16 mixed precision
Loss Scale: ignore_empty_think
GPU Memory: ~74.62 GB per H100 (stable)
Dataset Characteristics
- Samples: 6,319 dramatic script continuations
- Average Length: ~5,000 tokens per sample
- Max Length: ~6,100 tokens
- Format: Conversations with
<think>...</think>reasoning tags - Content:
- Script opening scenes (title, description, initial dialogue)
- Extensive creative deliberation (3,000+ tokens of thinking)
- Script continuation with proper formatting
- Style: Dramatic, emotionally intense scenarios (conflicts, reconciliation, tragedy)
Training Metrics
- Final Loss: 0.844
- Average Loss: 0.978
- Loss Trajectory: 1.602 (start) โ 0.82-0.83 (end)
- Training Speed: ~8 seconds/iteration
- Total Steps: 1,185
- Checkpoints: 5 saved (400, 800, 900, 1000, 1185)
Usage
Quick Start (ms-swift)
# Install ms-swift
pip install ms-swift
# Inference (interactive mode)
swift infer \
--ckpt_dir FutureMa/Qwen3-8B-Drama-Thinking \
--template qwen3_thinking \
--max_new_tokens 4096 \
--temperature 0.7
Python API
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import PtEngine, InferRequest, RequestConfig
# Initialize engine
engine = PtEngine(
model_id_or_path="FutureMa/Qwen3-8B-Drama-Thinking",
max_batch_size=1,
torch_dtype="bfloat16"
)
# Create prompt
prompt = """Title: The Reunion
Description: Two estranged siblings meet at their childhood home after 20 years.
INT. FAMILY LIVING ROOM - DAY
SARAH (35) stands by the window, looking out at the garden.
MICHAEL (38) enters, hesitant."""
# Generate (streaming)
messages = [{'role': 'user', 'content': prompt}]
request = InferRequest(messages=messages)
config = RequestConfig(max_tokens=4096, temperature=0.7, stream=True)
for response in engine.infer([request], config)[0]:
if response:
print(response.choices[0].delta.content, end='', flush=True)
Using with Transformers (requires adapting checkpoint)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"FutureMa/Qwen3-8B-Drama-Thinking",
torch_dtype="bfloat16",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
messages = [
{"role": "system", "content": "You are a creative screenwriter assistant specialized in dramatic storytelling with internal reasoning."},
{"role": "user", "content": "Write a scene about forgiveness..."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example Output
Input Prompt:
Title: The Last Apology
Description: A daughter arrives at her estranged father's deathbed...
Model Output (excerpt):
<think>
Okay. Let's go. I'm sitting here with the opening of *The Last Apology* fresh in my mind,
and I need to figure out exactly where this goes nextโnot just the next scene, but the next
emotional beat, the next structural pivot...
First, the title: *The Last Apology*. That's not just poeticโit's structural. It tells me
this story is about delayed recognition, about the finality of words left unsaid...
[3,400 tokens of deep creative analysis including:]
- Title deconstruction and thematic implications
- Character psychology analysis
- Three-act structure planning
- Visual language and symbolism
- Multiple narrative paths considered
- Professional screenwriting techniques
</think>
INT. HOSPITAL ROOM - NIGHT
ANNA (28), in a wrinkled business suit, hesitates at the doorway.
DAVID (65) lies in bed, breathing labored...
[Script continues with proper formatting]
Intended Use
โ Recommended Use Cases
- Screenwriting Education: Learn professional creative thinking process
- Script Ideation: Generate story frameworks and narrative alternatives
- Story Consulting: Explore "what if" scenarios with explicit reasoning
- Creative Brainstorming: Understand decision-making in storytelling
- Draft Development: Plan structure before execution
โ Not Recommended For
- Final Shooting Scripts: Requires human refinement for production
- Comedy/Action Genres: Training bias toward dramatic content
- Long-form Series: Single-pass generation may lack consistency
- Immediate Production: Dialogue needs naturalization
Evaluation Results
Quantitative Metrics (vs. Base Model)
| Aspect | Score | Base Model | Improvement |
|---|---|---|---|
| Thinking Depth | 9/10 | 5/10 | +80% |
| Script Format | 9/10 | 8/10 | +13% |
| Dramatic Craft | 8.5/10 | 8/10 | +6% |
| Character Psychology | 9/10 | 6/10 | +50% |
| Decision Transparency | 9/10 | 5/10 | +80% |
| Overall | 8.1/10 | 6.9/10 | +17% |
Qualitative Improvements
- โ Professional Voice: Sounds like experienced screenwriter
- โ Structural Thinking: Explicit three-act planning
- โ Meta-Awareness: "This isn't just a script. It's a reckoning."
- โ Non-Linear Reasoning: Considers alternatives, backtracks, refines
- โ Craft-Oriented: Explains why choices serve the story
Limitations
Thinking Verbosity: Generates ~3,400 tokens of thinking (87% of output)
- May be excessive for quick tasks
- Consider using
max_new_tokensto limit length
Incomplete Execution: Token budget consumed by thinking
- Many planned scenes not fully generated
- May need 6,000-8,000 token limit for complete scripts
Dialogue Naturalness: More direct/literary than conversational
- Training data style influences output
- May need post-processing for natural speech
Training Data Bias: Skews toward melodramatic scenarios
- Less suited for subtle/realistic dialogue
- Best for emotionally intense stories
Training Insights
What Made This Successful
8192 Token Context: Essential for capturing full thinking chains
- Initial assumption of 2048 would have truncated data
- Average sample length: ~5,000 tokens
DeepSpeed Zero3: Required (not optional)
- Single H100: Would need ~109-114 GB (OOM)
- Zero3 sharding: ~74.62 GB per card โ
Full Parameter Training: Worth the cost
- Deeper capability transfer than LoRA
- Better thinking process internalization
- Cost: $17.86 (2.8 hours) vs ~$5 for LoRA
Quality Training Data: 6,319 long-form reasoning examples
- Actual creative process in
<think>tags - High-quality dramatic writing
- Actual creative process in
Citation
@misc{qwen3-drama-thinking-2025,
author = {FutureMa},
title = {Qwen3-8B-Drama-Thinking: Full Parameter Fine-tuning for Creative Screenwriting},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/FutureMa/Qwen3-8B-Drama-Thinking}},
note = {Full parameter fine-tuning on 6,319 drama samples with explicit reasoning chains}
}
Acknowledgments
- Base Model: Qwen Team - Qwen3-8B
- Training Framework: ms-swift - ModelScope SWIFT
- Infrastructure: Lambda Cloud - 2x H100 80GB SXM5
- Dataset: Custom Drama Thinking Dataset (6,319 samples)
Model Card Contact
For questions or feedback:
- HuggingFace: @FutureMa
- GitHub Issues: Report via ms-swift repository
Training Date: 2025-12-08 Training Duration: 2h 46m Model Size: ~16GB (BF16 precision) Recommended VRAM: 16GB+ for inference
- Downloads last month
- 33
Model tree for FutureMa/Qwen3-8B-Drama-Thinking
Evaluation results
- Thinking Depth Scoreself-reported9.000
- Script Format Scoreself-reported9.000
- Dramatic Craft Scoreself-reported8.500