dynamicjerry 's Collections Papers
updated
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Paper
• 2312.13314
• Published
• 8
LLM in a flash: Efficient Large Language Model Inference with Limited
Memory
Paper
• 2312.11514
• Published
• 260
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper
• 2312.09911
• Published
• 55
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion
Models
Paper
• 2312.09608
• Published
• 16
VecFusion: Vector Font Generation with Diffusion
Paper
• 2312.10540
• Published
• 22
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip
Connection Editing
Paper
• 2312.11392
• Published
• 20
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image
Inpainting with Diffusion Models
Paper
• 2312.14091
• Published
• 17
Eliminating Oversaturation and Artifacts of High Guidance Scales in
Diffusion Models
Paper
• 2410.02416
• Published
• 34
FashionComposer: Compositional Fashion Image Generation
Paper
• 2412.14168
• Published
• 17
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
• 2412.11768
• Published
• 43
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
• 2412.10360
• Published
• 147
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
• 2412.09626
• Published
• 21
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better
Reasoning in SLMs
Paper
• 2412.08347
• Published
• 4
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper
• 2412.13303
• Published
• 75
Style Customization of Text-to-Vector Generation with Image Diffusion
Priors
Paper
• 2505.10558
• Published
• 16
LightLab: Controlling Light Sources in Images with Diffusion Models
Paper
• 2505.09608
• Published
• 37
Chain-of-Thought Tokens are Computer Program Variables
Paper
• 2505.04955
• Published
• 8
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video
Generation
Paper
• 2505.04512
• Published
• 36
Voila: Voice-Language Foundation Models for Real-Time Autonomous
Interaction and Voice Role-Play
Paper
• 2505.02707
• Published
• 85
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based
Image Editing
Paper
• 2505.02370
• Published
• 14