Antonio (Anthonny) Badilla-Olivas
abotresol
·
AI & ML interests
NLP, Deep Reinforcement Learning.
Organizations
None yet
Datasets
Interpretability and llms
Encoders llm
Agents
-
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 29 -
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
Paper • 2502.16111 • Published • 9 -
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Paper • 2503.06580 • Published • 20 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 10
LLMs and memory
Language Modelling Arc
Llms and reasoning
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 434 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 58 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
Image-gen-models
-
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 61 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 -
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Paper • 2502.20126 • Published • 19 -
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Paper • 2502.16944 • Published • 10
Fine-tuning
Foundational Models
reinforcement learning llms
Tokenizers
Llms Ops
More efficient sequence modelling
Llms writing skills
basic-blocs
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 9 -
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Paper • 2504.18415 • Published • 47
Memory
Fine-tuning
Datasets
Foundational Models
Interpretability and llms
reinforcement learning llms
Encoders llm
Tokenizers
Agents
-
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 29 -
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
Paper • 2502.16111 • Published • 9 -
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Paper • 2503.06580 • Published • 20 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 10
Llms Ops
LLMs and memory
More efficient sequence modelling
Language Modelling Arc
Llms writing skills
Llms and reasoning
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 434 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 58 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
basic-blocs
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 9 -
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Paper • 2504.18415 • Published • 47
Image-gen-models
-
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 61 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 -
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Paper • 2502.20126 • Published • 19 -
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Paper • 2502.16944 • Published • 10