lucius-40 commited on
Commit
381b74d
·
verified ·
1 Parent(s): 4fb7bee

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - bn
4
+ - en
5
+ license: apache-2.0
6
+ tags:
7
+ - multimodal
8
+ - image-classification
9
+ - text-classification
10
+ - meme-classification
11
+ - bengali
12
+ - clip
13
+ - bert
14
+ - attention-fusion
15
+ library_name: pytorch
16
+ pipeline_tag: image-classification
17
+ datasets:
18
+ - custom
19
+ metrics:
20
+ - accuracy
21
+ - f1
22
+ - precision
23
+ - recall
24
+ base_model:
25
+ - openai/clip-vit-base-patch32
26
+ - sagorsarker/bangla-bert-base
27
+ ---
28
+
29
+ # Political Meme Classification - MAF Model
30
+
31
+ ## Model Description
32
+
33
+ Multimodal Attention Fusion (MAF) model for binary classification of Bengali political memes:
34
+ - **NonPolitical (0)**: Non-political content
35
+ - **Political (1)**: Political content
36
+
37
+ This model combines visual features from CLIP and textual features from Bangla-BERT using multi-head attention to classify meme images with Bengali text.
38
+
39
+ ## Architecture
40
+
41
+ - **Visual Encoder**: CLIP ViT-B/32 (last 2 transformer blocks fine-tuned)
42
+ - **Text Encoder**: Bangla-BERT (last 2 layers fine-tuned)
43
+ - **Fusion**: Multi-head Attention (16 heads) for cross-modal interaction
44
+ - **Classifier**: 2-layer fully connected network with dropout
45
+ - **Input**: 224x224 images + Bengali text (max 70 tokens)
46
+ - **Output**: Binary classification (NonPolitical/Political)
47
+
48
+ ## Training Details
49
+
50
+ - **Task**: Binary Image Classification
51
+ - **Dataset**: PoliMemeDecode (2,290 training samples, 572 validation samples)
52
+ - **Epochs**: 10
53
+ - **Learning Rate**: 8e-05
54
+ - **Batch Size**: 16
55
+ - **Max Text Length**: 70
56
+ - **Attention Heads**: 16
57
+ - **Optimizer**: AdamW with linear warmup scheduler
58
+ - **Loss**: CrossEntropyLoss
59
+
60
+ ## Usage
61
+
62
+ ```python
63
+ from huggingface_hub import hf_hub_download
64
+ import torch
65
+ import clip
66
+ from transformers import AutoTokenizer
67
+
68
+ # Download model files
69
+ model_path = hf_hub_download(repo_id="lucius-40/bengali-political-maf-v3", filename="maf_model.pth")
70
+ arch_path = hf_hub_download(repo_id="lucius-40/bengali-political-maf-v3", filename="model_architecture.py")
71
+
72
+ # Import architecture
73
+ import importlib.util
74
+ spec = importlib.util.spec_from_file_location("model_architecture", arch_path)
75
+ model_arch = importlib.util.module_from_spec(spec)
76
+ spec.loader.exec_module(model_arch)
77
+ MAF = model_arch.MAF
78
+
79
+ # Setup device
80
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
81
+
82
+ # Load CLIP visual encoder
83
+ clip_model, _ = clip.load("ViT-B/32", device=device)
84
+ clip_model = clip_model.visual.float()
85
+
86
+ # Initialize and load trained model
87
+ model = MAF(clip_model, num_classes=2, num_heads=16)
88
+ model.load_state_dict(torch.load(model_path, map_location=device))
89
+ model = model.to(device)
90
+ model.eval()
91
+
92
+ # Prepare tokenizer
93
+ tokenizer = AutoTokenizer.from_pretrained("sagorsarker/bangla-bert-base")
94
+
95
+ # Run inference
96
+ # ... (prepare image and text inputs)
97
+ ```
98
+
99
+ ## Model Performance
100
+
101
+ Evaluated on validation set with binary classification metrics:
102
+ - Accuracy, Precision, Recall, F1 Score
103
+ - Class-specific metrics for Political class
104
+ - Confusion matrix analysis
105
+
106
+ ## Requirements
107
+
108
+ ```
109
+ torch>=1.9.0
110
+ torchvision>=0.10.0
111
+ transformers>=4.41.2
112
+ clip @ git+https://github.com/openai/CLIP.git
113
+ pillow>=9.5.0
114
+ ```
115
+
116
+ ## Citation
117
+
118
+ ```bibtex
119
+ @inproceedings{ahsan2024multimodal,
120
+ title={A Multimodal Framework to Detect Target Aware Aggression in Memes},
121
+ author={Ahsan, Shawly and Hossain, Eftekhar and Sharif, Omar and Das, Avishek and Hoque, Mohammed Moshiul and Dewan, M},
122
+ booktitle={Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)},
123
+ pages={2487--2500},
124
+ year={2024}
125
+ }
126
+ ```
127
+
128
+ ## License
129
+
130
+ Apache 2.0
131
+
132
+ ## Limitations
133
+
134
+ - Trained specifically on Bengali political memes
135
+ - Requires both image and text input
136
+ - Performance may vary on out-of-domain content
137
+ - Binary classification only (Political vs NonPolitical)