SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published 6 days ago • 3
SkillFactory: Self-Distillation For Learning Cognitive Behaviors Paper • 2512.04072 • Published 6 days ago • 3
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation Paper • 2501.05414 • Published Jan 9 • 2
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering Paper • 2305.14869 • Published May 24, 2023 • 1
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning Paper • 2401.07286 • Published Jan 14, 2024
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce Paper • 2406.10173 • Published Jun 14, 2024
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation Paper • 2504.15254 • Published Apr 21 • 5
TAUR-Lab/Taur_CoT_Analysis_Project___deepseek-ai__DeepSeek-R1-Distill-Llama-70B Viewer • Updated Feb 17 • 300 • 5
TAUR-Lab/Taur_CoT_Analysis_Project___meta-llama__Llama-3.3-70B-Instruct Viewer • Updated Feb 17 • 2.5k • 5
TAUR-Lab/Taur_CoT_Analysis_Project___deepseek-ai__DeepSeek-R1-Distill-Llama-70B Viewer • Updated Feb 17 • 300 • 5