Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini Paper • 2605.27295 • Published 2 days ago • 5
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation Paper • 2604.27263 • Published 14 days ago • 10
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation Paper • 2604.27263 • Published 14 days ago • 10
view post Post 104 One prompt, three answers - which model is from where? johko/llm-blind-dateI built a little demo where you give three models (Apertus, Llama, Qwen3) the same prompt and in the end you have to guess which is which just based on their answers.GIve it a try! ;) See translation 👍 1 1 + Reply
Geometric Context Transformer for Streaming 3D Reconstruction Paper • 2604.14141 • Published Apr 15 • 21
Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation Paper • 2604.05083 • Published Apr 6
A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens Paper • 2604.04913 • Published Apr 6 • 12
MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios Paper • 2603.28130 • Published Mar 30 • 11
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders Paper • 2603.19209 • Published Mar 19 • 6
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published Mar 19 • 3
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published Mar 19 • 3
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published Mar 15 • 36
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model Paper • 2602.17807 • Published Feb 19 • 7