LatentThinkingPKU

community

AI & ML interests

None defined yet.

Recent Activity

DogNeverSleep authored a paper 2 days ago

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

DogNeverSleep authored a paper 4 days ago

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

DogNeverSleep authored a paper 4 days ago

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

View all activity

authored a paper 2 days ago

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Paper • 2605.22012 • Published 4 days ago • 38

authored 2 papers 4 days ago

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Paper • 2605.18984 • Published 7 days ago • 22

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Paper • 2605.20183 • Published 6 days ago • 14

submitted a paper to Daily Papers 5 days ago

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Paper • 2605.18984 • Published 7 days ago • 22

authored a paper 10 days ago

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

Paper • 2605.13062 • Published 12 days ago • 33

submitted a paper to Daily Papers 11 days ago

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

Paper • 2605.13062 • Published 12 days ago • 33

authored a paper 11 days ago

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Paper • 2605.10780 • Published 13 days ago • 33

submitted a paper to Daily Papers 11 days ago

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Paper • 2605.10780 • Published 13 days ago • 33

authored a paper about 2 months ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 235

authored a paper about 2 months ago

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Paper • 2604.03016 • Published Apr 3 • 37

authored a paper about 2 months ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

authored a paper about 2 months ago

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 203

authored a paper about 2 months ago

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published Mar 27 • 18

submitted a paper to Daily Papers about 2 months ago

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published Mar 27 • 18

authored a paper 2 months ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published Mar 18 • 12

submitted a paper to Daily Papers 2 months ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published Mar 18 • 12

authored a paper 2 months ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

authored a paper 2 months ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

submitted a paper to Daily Papers 2 months ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

authored a paper 3 months ago

BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Paper • 2602.12876 • Published Feb 13 • 14