Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26, 2024 • 48
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17, 2024 • 33
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 48
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models Paper • 2406.14035 • Published Jun 20, 2024 • 13