Vision and language - a moegi161 Collection

moegi161 's Collections

Vision and language

3D

Vision and language

updated Jun 5, 2024

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Paper • 2404.04125 • Published Apr 4, 2024 • 29
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4, 2024 • 35
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 13
3D Congealing: 3D-Aware Image Alignment in the Wild

Paper • 2404.02125 • Published Apr 2, 2024 • 10
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

Paper • 2404.04544 • Published Apr 6, 2024 • 23
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11, 2024 • 48
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 19
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10, 2024 • 27
Learning to Route Among Specialized Experts for Zero-Shot Generalization

Paper • 2402.05859 • Published Feb 8, 2024 • 5
Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset

Paper • 2403.00587 • Published Mar 1, 2024
ReGround: Improving Textual and Spatial Grounding at No Cost

Paper • 2403.13589 • Published Mar 20, 2024 • 1
FlexCap: Generating Rich, Localized, and Flexible Captions in Images

Paper • 2403.12026 • Published Mar 18, 2024 • 2
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 71
Editable Image Elements for Controllable Synthesis

Paper • 2404.16029 • Published Apr 24, 2024 • 12
Move Anything with Layered Scene Diffusion

Paper • 2404.07178 • Published Apr 10, 2024
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Paper • 2405.21048 • Published May 31, 2024 • 16