SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research Paper • 2606.09730 • Published 4 days ago • 49
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 173
SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia Paper • 2606.03027 • Published 10 days ago • 1
GrepSeek: Training Search Agents for Direct Corpus Interaction Paper • 2605.29307 • Published 15 days ago • 106
MiniCPM RAG Suite Collection Embedding, re-ranking, generation -- the cornerstone of RAG. • 7 items • Updated 19 days ago • 18
MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published Feb 19, 2025 • 49
ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution Paper • 2604.13787 • Published Apr 15 • 2