DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2, 2025 • 20
ARGUS: Hallucination and Omission Evaluation in Video-LLMs Paper • 2506.07371 • Published Jun 9, 2025 • 8
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published Jun 5, 2025 • 34
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published Feb 26, 2025 • 20
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference Paper • 2502.09974 • Published Feb 14, 2025 • 9
Gemstones: A Model Suite for Multi-Faceted Scaling Laws Paper • 2502.06857 • Published Feb 7, 2025 • 24
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7, 2025 • 151
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 101
Goldfish Loss: Mitigating Memorization in LLMs Collection This collection contains artifacts from our paper titled: "Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs." • 9 items • Updated May 21, 2025 • 3
From Pixels to Prose: A Large Dataset of Dense Image Captions Paper • 2406.10328 • Published Jun 14, 2024 • 18
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs Paper • 2406.10209 • Published Jun 14, 2024 • 8
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 54
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Paper • 2401.12070 • Published Jan 22, 2024 • 45