MassSpecGym Reconstructor: Fingerprint Prediction Transformer
This model is a Spectral Transformer optimized for molecular fingerprint reconstruction from MS/MS spectra. It treats identification as a multi-label classification task.
Model Details
- Architecture: Transformer Encoder (2 layers, 4 heads) with Fourier Feature m/z encoding.
- Objective: Focal Loss (gamma = 2.0) to address sparsity in molecular fingerprints.
- Input: MS/MS fragment peaks (m/z and intensity) + Precursor mass.
- Output: 4096-dimensional binary fingerprint vector.
Performance (MassSpecGym Test Set)
The model focuses on structural fidelity and recovering rare active substructures:
- Sample-wise F1-Score: 28.27%
- Effectiveness: Successfully avoids the "all-zero" prediction trap common in sparse chemical data.
Key Features
- Focal Loss Optimization: Specifically down-weights "easy negatives" (zero bits) to focus on rare structural fragments.
- Isotope Awareness: Uses Fourier Features to distinguish small mass shifts.
- Attention Pooling: Learns to ignore spectral noise and focus on diagnostic fragments.
Usage
For full implementation and evaluation details, visit the GitHub Repository.