EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture Paper • 2512.04810 • Published 28 days ago • 25
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21, 2025 • 78
Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval Paper • 2403.01431 • Published Mar 3, 2024