lucius-40
/

bengali-political-maf-v3

+---
+language:
+- bn
+- en
+license: apache-2.0
+tags:
+- multimodal
+- image-classification
+- text-classification
+- meme-classification
+- bengali
+- clip
+- bert
+- attention-fusion
+library_name: pytorch
+pipeline_tag: image-classification
+datasets:
+- custom
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+base_model:
+- openai/clip-vit-base-patch32
+- sagorsarker/bangla-bert-base
+---
+# Political Meme Classification - MAF Model
+## Model Description
+Multimodal Attention Fusion (MAF) model for binary classification of Bengali political memes:
+- **NonPolitical (0)**: Non-political content
+- **Political (1)**: Political content
+This model combines visual features from CLIP and textual features from Bangla-BERT using multi-head attention to classify meme images with Bengali text.
+## Architecture
+- **Visual Encoder**: CLIP ViT-B/32 (last 2 transformer blocks fine-tuned)
+- **Text Encoder**: Bangla-BERT (last 2 layers fine-tuned)
+- **Fusion**: Multi-head Attention (16 heads) for cross-modal interaction
+- **Classifier**: 2-layer fully connected network with dropout
+- **Input**: 224x224 images + Bengali text (max 70 tokens)
+- **Output**: Binary classification (NonPolitical/Political)
+## Training Details
+- **Task**: Binary Image Classification
+- **Dataset**: PoliMemeDecode (2,290 training samples, 572 validation samples)
+- **Epochs**: 10
+- **Learning Rate**: 8e-05
+- **Batch Size**: 16
+- **Max Text Length**: 70
+- **Attention Heads**: 16
+- **Optimizer**: AdamW with linear warmup scheduler
+- **Loss**: CrossEntropyLoss
+## Usage
+```python
+from huggingface_hub import hf_hub_download
+import torch
+import clip
+from transformers import AutoTokenizer
+# Download model files
+model_path = hf_hub_download(repo_id="lucius-40/bengali-political-maf-v3", filename="maf_model.pth")
+arch_path = hf_hub_download(repo_id="lucius-40/bengali-political-maf-v3", filename="model_architecture.py")
+# Import architecture
+import importlib.util
+spec = importlib.util.spec_from_file_location("model_architecture", arch_path)
+model_arch = importlib.util.module_from_spec(spec)
+spec.loader.exec_module(model_arch)
+MAF = model_arch.MAF
+# Setup device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Load CLIP visual encoder
+clip_model, _ = clip.load("ViT-B/32", device=device)
+clip_model = clip_model.visual.float()
+# Initialize and load trained model
+model = MAF(clip_model, num_classes=2, num_heads=16)
+model.load_state_dict(torch.load(model_path, map_location=device))
+model = model.to(device)
+model.eval()
+# Prepare tokenizer
+tokenizer = AutoTokenizer.from_pretrained("sagorsarker/bangla-bert-base")
+# Run inference
+# ... (prepare image and text inputs)
+```
+## Model Performance
+Evaluated on validation set with binary classification metrics:
+- Accuracy, Precision, Recall, F1 Score
+- Class-specific metrics for Political class
+- Confusion matrix analysis
+## Requirements
+```
+torch>=1.9.0
+torchvision>=0.10.0
+transformers>=4.41.2
+clip @ git+https://github.com/openai/CLIP.git
+pillow>=9.5.0
+```
+## Citation
+```bibtex
+@inproceedings{ahsan2024multimodal,
+  title={A Multimodal Framework to Detect Target Aware Aggression in Memes},
+  author={Ahsan, Shawly and Hossain, Eftekhar and Sharif, Omar and Das, Avishek and Hoque, Mohammed Moshiul and Dewan, M},
+  booktitle={Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  pages={2487--2500},
+  year={2024}
+}
+```
+## License
+Apache 2.0
+## Limitations
+- Trained specifically on Bengali political memes
+- Requires both image and text input
+- Performance may vary on out-of-domain content
+- Binary classification only (Political vs NonPolitical)