Papers - a dynamicjerry Collection

dynamicjerry 's Collections

Papers

updated May 16, 2025

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 49
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Paper • 2312.13314 • Published Dec 20, 2023 • 8
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 260
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Paper • 2312.09608 • Published Dec 15, 2023 • 16
VecFusion: Vector Font Generation with Diffusion

Paper • 2312.10540 • Published Dec 16, 2023 • 22
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 20
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Paper • 2312.14091 • Published Dec 21, 2023 • 17
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Paper • 2410.02416 • Published Oct 3, 2024 • 34
FashionComposer: Compositional Fashion Image Generation

Paper • 2412.14168 • Published Dec 18, 2024 • 17
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 43
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Paper • 2412.09626 • Published Dec 12, 2024 • 21
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs

Paper • 2412.08347 • Published Dec 11, 2024 • 4
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 75
Style Customization of Text-to-Vector Generation with Image Diffusion Priors

Paper • 2505.10558 • Published May 15, 2025 • 16
LightLab: Controlling Light Sources in Images with Diffusion Models

Paper • 2505.09608 • Published May 14, 2025 • 37
Chain-of-Thought Tokens are Computer Program Variables

Paper • 2505.04955 • Published May 8, 2025 • 8
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7, 2025 • 36
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5, 2025 • 85
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Paper • 2505.02370 • Published May 5, 2025 • 14