THU-KEG
/

DeepDive-4B-SFT

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SFT Model for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards"

Downloads last month: 49

Safetensors

Model size

4B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for THU-KEG/DeepDive-4B-SFT

Quantizations

Collection including THU-KEG/DeepDive-4B-SFT

CaRR & C-GRPO

Data and models for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards". • 6 items • Updated about 6 hours ago • 1

Paper for THU-KEG/DeepDive-4B-SFT

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Paper • 2601.06021 • Published Jan 9 • 47