sum-entity-infilling-onehead / README.md

Glazkov

Upload ModernBERT entity infilling model - 2025-11-25 12:22:56

83b24f3 verified 15 days ago

preview code

raw

history blame contribute delete

3.44 kB

metadata

license: mit
base_model: answerdotai/ModernBERT-base
tags:
  - modernbert
  - entity-infilling
  - text-summarization
  - masked-modeling
  - pytorch
library_name: transformers
datasets:
  - cnn_dailymail
model-index:
  - name: Glazkov/sum-entity-infilling-onehead
    results:
      - task:
          type: entity-infilling
          name: Entity Infilling
        dataset:
          name: cnn_dailymail
          type: cnn_dailymail
        metrics:
          - name: Entity Recall
            type: entity_recall
            value: TBD

Glazkov/sum-entity-infilling-onehead

This model is a fine-tuned version of answerdotai/ModernBERT-base trained on the cnn_dailymail dataset for entity infilling tasks.

Model Description

The model is designed to reconstruct masked entities in text using summary context. It was trained using a sequence-to-sequence approach where the model learns to predict original entities that have been replaced with <mask> tokens in the source text.

Intended Uses & Limitations

Intended Uses:

Entity reconstruction in summarization
Text completion and infilling
Research in masked language modeling
Educational purposes

Limitations:

Trained primarily on news article data
May not perform well on highly technical or domain-specific content
Performance varies with entity length and context

Training Details

Training Procedure

Evaluation Results

The model was evaluated using entity recall metrics on a validation set from the CNN/DailyMail dataset.

Metrics:

Entity Recall: Percentage of correctly reconstructed entities
Token Accuracy: Token-level prediction accuracy
Exact Match: Full sequence reconstruction accuracy

Usage

from transformers import AutoTokenizer, AutoModelForMaskedLM
from src.train.inference import EntityInfillingInference

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Glazkov/sum-entity-infilling-onehead")
model = AutoModelForMaskedLM.from_pretrained("Glazkov/sum-entity-infilling-onehead")

# Initialize inference
inference = EntityInfillingInference(
    model_path="Glazkov/sum-entity-infilling-onehead",
    device="cuda"  # or "cpu"
)

# Example inference
summary = "Membership gives the ICC jurisdiction over alleged crimes..."
masked_text = "(<mask> officially became the 123rd member of the International Criminal Court..."

predictions = inference.predict_masked_entities(
    summary=summary,
    masked_text=masked_text
)

Training Configuration

This model was trained using the following configuration:

Base Model: answerdotai/ModernBERT-base
Dataset: cnn_dailymail
Task: Entity Infilling
Framework: PyTorch with Accelerate
Training Date: 2025-11-25

For more details about the training process, see the training configuration file.

Model Architecture

The model uses ModernBERT architecture with:

12 transformer layers
Hidden size: 768
Vocabulary: Custom with <mask> token support
Maximum sequence length: 512 tokens

Acknowledgments

Hugging Face Transformers for the model architecture
CNN/DailyMail dataset for training data
Answer.AI for the ModernBERT base model

License

This model is licensed under the MIT License.