LexiMind / README.md
OliverPerrin
Fixed Ruff check and small redme update
2261920
---
title: LexiMind
emoji: 🧠
colorFrom: blue
colorTo: indigo
sdk: docker
app_file: scripts/demo_gradio.py
pinned: false
---
<!-- markdownlint-disable MD025 -->
# LexiMind
A multi-task NLP system for literary and academic text understanding. LexiMind jointly performs **abstractive summarization**, **topic classification**, and **multi-label emotion detection** using a single encoder-decoder transformer initialized from [FLAN-T5-base](https://huggingface.co/google/flan-t5-base) (272M parameters).
**[Live Demo](https://huggingface.co/spaces/OliverPerrin/LexiMind)** Β· **[Model](https://huggingface.co/OliverPerrin/LexiMind-Model)** Β· **[Discovery Dataset](https://huggingface.co/datasets/OliverPerrin/LexiMind-Discovery)** Β· **[Research Paper](docs/research_paper.tex)**
## Results
| Task | Metric | Score |
| ---- | ------ | ----- |
| Summarization | ROUGE-1 / ROUGE-L | 0.309 / 0.185 |
| Summarization (academic) | ROUGE-1 | 0.319 |
| Summarization (literary) | ROUGE-1 | 0.206 |
| Topic Classification | Accuracy (95% CI) | 85.7% (80.4–91.0%) |
| Emotion Detection | Sample-avg F1 | 0.352 |
| Emotion Detection (tuned thresholds) | Sample-avg F1 / Macro F1 | 0.503 / 0.294 |
Trained for 8 epochs on an RTX 4070 12GB (~9 hours) with BFloat16 mixed precision, `torch.compile`, and cosine LR decay.
## Key Findings
From my research paper:
- **Naive MTL produces mixed results**: topic classification benefits (+3.7% accuracy), but emotion detection suffers negative transfer (βˆ’0.02 F1) under mean pooling with round-robin scheduling.
- **Learned attention pooling + temperature sampling eliminates negative transfer entirely**: emotion F1 improves from 0.199 β†’ 0.352 (+77%), surpassing the single-task baseline (0.218).
- **Summarization is robust to MTL** β€” quality remains stable across configurations.
- **FLAN-T5 pre-training is essential** β€” random initialization produces dramatically worse results on all tasks.
- **Domain gap matters**: academic summaries (ROUGE-1: 0.319) substantially outperform literary (0.206), driven by an 11:1 training data imbalance.
## Architecture
LexiMind is a **from-scratch PyTorch Transformer** that loads pre-trained FLAN-T5-base weights layer by layer via a custom factory module β€” no HuggingFace model wrappers.
| Component | Detail |
| --------- | ------ |
| Backbone | Encoder-Decoder Transformer (272M params) |
| Encoder / Decoder | 12 layers each, 768d, 12 attention heads |
| Normalization | RMSNorm (Pre-LN, T5-style) |
| Attention | FlashAttention via PyTorch SDPA + T5 relative position bias |
| FFN | Gated-GELU (wi\_0, wi\_1, wo) |
| Summarization | Full decoder β†’ language modeling head |
| Emotion (28-class multi-label) | Learned attention pooling β†’ linear head |
| Topic (7-class) | Mean pooling β†’ linear head |
### Multi-Task Training
All three tasks share the encoder. Summarization uses the full encoder-decoder; classification heads branch off the encoder output. Key training details:
- **Temperature-based task sampling** (Ξ±=0.5): allocates training steps proportional to dataset size, preventing large tasks from dominating
- **Attention pooling** for emotion: a learned query attends over encoder outputs, focusing on emotionally salient tokens rather than averaging the full sequence
- **Fixed loss weights**: summarization=1.0, emotion=1.0, topic=0.3 (reduced to prevent overfitting on the small topic dataset)
- **Frozen encoder layers 0–3**: preserves FLAN-T5's language understanding in lower layers
- **Gradient conflict diagnostics**: optional inter-task gradient cosine similarity monitoring
See [docs/architecture.md](docs/architecture.md) for full implementation details, weight loading tables, and training configuration rationale.
## Training Data
| Task | Source | Samples |
| ---- | ------ | ------- |
| Summarization | Gutenberg + Goodreads descriptions (literary) | ~4K |
| Summarization | arXiv body β†’ abstract (academic) | ~45K |
| Topic | Gutenberg + arXiv metadata β†’ 7 categories | 3,402 |
| Emotion | GoEmotions β€” Reddit comments, 28 labels | 43,410 |
For summarization, the model learns to produce descriptive summaries β€” what a book *is about* β€” rather than plot recaps, by pairing Gutenberg full texts with Goodreads descriptions and arXiv papers with their abstracts.
## Getting Started
### Prerequisites
- Python 3.10+
- NVIDIA GPU with CUDA (for training; CPU works for inference)
### Installation
```bash
git clone https://github.com/OliverPerrin/LexiMind.git
cd LexiMind
pip install -r requirements.txt
```
### Training
```bash
# Full training (~9 hours on RTX 4070 12GB)
python scripts/train.py training=full
# Quick dev run
python scripts/train.py training=dev
# Override parameters
python scripts/train.py training=full training.optimizer.lr=5e-5
# Resume from checkpoint
python scripts/train.py training=full resume_from=checkpoints/epoch_5.pt
```
Experiments are tracked with MLflow (`mlflow ui` to browse).
### Evaluation
```bash
python scripts/evaluate.py
python scripts/evaluate.py --skip-bertscore # faster
python scripts/evaluate.py --tune-thresholds # per-class threshold tuning
```
### Inference
```bash
# Command-line
python scripts/inference.py "Your text to analyze"
# Gradio web demo
python scripts/demo_gradio.py
```
### Profiling
```bash
# Profile GPU usage (CUDA kernels, memory, Chrome trace)
python scripts/profile_training.py
```
### Docker
```bash
docker build -t leximind .
docker run -p 7860:7860 leximind
```
## Project Structure
```text
src/
β”œβ”€β”€ models/ # Encoder, decoder, attention, FFN, heads, factory
β”œβ”€β”€ data/ # Datasets, dataloaders, tokenization, cross-task dedup
β”œβ”€β”€ training/ # Trainer (AMP, grad accum, temperature sampling), metrics
β”œβ”€β”€ inference/ # Pipeline + factory for checkpoint loading
β”œβ”€β”€ api/ # FastAPI REST endpoint
└── utils/ # Device detection, checkpointing, label I/O
scripts/
β”œβ”€β”€ train.py # Hydra training entry point
β”œβ”€β”€ evaluate.py # Full evaluation suite
β”œβ”€β”€ inference.py # CLI inference
β”œβ”€β”€ demo_gradio.py # Gradio discovery demo
β”œβ”€β”€ profile_training.py # PyTorch profiler
β”œβ”€β”€ train_multiseed.py # Multi-seed training with aggregation
β”œβ”€β”€ visualize_training.py # Training curve visualization
β”œβ”€β”€ download_data.py # Dataset downloader
└── build_discovery_dataset.py # Pre-compute discovery dataset
configs/ # Hydra configs (model, training, data)
docs/ # Research paper + architecture documentation
tests/ # Pytest suite
```
## Code Quality
```bash
ruff check . # Linting
mypy src/ scripts/ tests/ # Type checking
pytest # Tests
pre-commit run --all-files # All checks
```
## License
GPL-3.0 β€” see [LICENSE](LICENSE) for details.
---
Built by Oliver Perrin Β· Appalachian State University Β· 2025–2026