Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.21691

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

NextGen Image Editing

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Paper • 2601.08303 • Published Jan 13 • 18
Running

20

Serverless ImgGen Hub

♨

20

Highly hackable hub w/ Flux, SD 3.5, LoRAs, no GPUs required

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 38
Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published Nov 13, 2025 • 52
Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 99
VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14, 2025 • 35

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Paper • 2508.09131 • Published Aug 12, 2025 • 16
Detect Anything via Next Point Prediction

Paper • 2510.12798 • Published Oct 14, 2025 • 50
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes

Paper • 2510.26800 • Published Oct 30, 2025 • 22
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

Image Gen Peformance Enhancement

Classifier-free Guidance with Adaptive Scaling

Paper • 2502.10574 • Published Feb 14, 2025
Composing Concepts from Images and Videos via Concept-prompt Binding

Paper • 2512.09824 • Published Dec 10, 2025 • 28
OmniPSD: Layered PSD Generation with Diffusion Transformer

Paper • 2512.09247 • Published Dec 10, 2025 • 48
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Paper • 2511.21678 • Published Nov 26, 2025 • 12
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

Paper • 2511.20937 • Published Nov 26, 2025 • 16
Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Paper • 2512.10949 • Published Dec 11, 2025 • 46

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 231
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Paper • 2511.15065 • Published Nov 19, 2025 • 77
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 129
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Diffusion Model Control

Control Methods for Diffusion and Score Models

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Paper • 2412.04146 • Published Dec 5, 2024 • 23
Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 36
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published Dec 6, 2024 • 12

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

Image Gen Peformance Enhancement

Classifier-free Guidance with Adaptive Scaling

Paper • 2502.10574 • Published Feb 14, 2025
Composing Concepts from Images and Videos via Concept-prompt Binding

Paper • 2512.09824 • Published Dec 10, 2025 • 28
OmniPSD: Layered PSD Generation with Diffusion Transformer

Paper • 2512.09247 • Published Dec 10, 2025 • 48
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

NextGen Image Editing

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Paper • 2601.08303 • Published Jan 13 • 18
Running

20

Serverless ImgGen Hub

♨

20

Highly hackable hub w/ Flux, SD 3.5, LoRAs, no GPUs required

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Paper • 2511.21678 • Published Nov 26, 2025 • 12
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

Paper • 2511.20937 • Published Nov 26, 2025 • 16
Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Paper • 2512.10949 • Published Dec 11, 2025 • 46

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 231
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Paper • 2511.15065 • Published Nov 19, 2025 • 77
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 129
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 38
Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published Nov 13, 2025 • 52
Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published Nov 13, 2025 • 99
VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14, 2025 • 35

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Paper • 2508.09131 • Published Aug 12, 2025 • 16
Detect Anything via Next Point Prediction

Paper • 2510.12798 • Published Oct 14, 2025 • 50
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes

Paper • 2510.26800 • Published Oct 30, 2025 • 22
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

Diffusion Model Control

Control Methods for Diffusion and Score Models

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Paper • 2412.04146 • Published Dec 5, 2024 • 23
Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 36
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published Dec 6, 2024 • 12

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs