Where Visual Document Retrieval Goes Arabic
Omartificial Intelligence Space PRO
Omartificial-Intelligence-Space
AI & ML interests
NLP & LLM
Recent Activity
upvoted an article 2 days ago
The State of Arabic Multimodal Embedding — What a 2B Finetune Taught Us published an article 2 days ago
The State of Arabic Multimodal Embedding — What a 2B Finetune Taught Us updated a collection 3 days ago
Arab-culture-aligned Multimodal Embedding Models & DatasetsOrganizations
Saudi Dialect Sentence Embedding Models Collection
Here is a collection of Saudi Dialect Embedding models with Sentence Embedding, classifiers and test dataset.
-
Omartificial-Intelligence-Space/SA-STS-Embeddings-0.2B
Feature Extraction • 0.2B • Updated • 20 • 1 -
Omartificial-Intelligence-Space/SA-BERT-V1
Fill-Mask • 0.2B • Updated • 15 • 4 -
Omartificial-Intelligence-Space/SaudiDialect-Triplet-21
Viewer • Updated • 2.96k • 9 • 3 -
Omartificial-Intelligence-Space/saudi-dialect-test-samples
Viewer • Updated • 1.28k • 22 • 4
DIRA – Diraya Arabic Reasoning AI
This is an Arabic Reasoning LLM Collection designed for advanced logical inference and instruction-based reasoning in Arabic via datasets and models.
-
Omartificial-Intelligence-Space/gpt-oss-math-ar
Updated • 2 • 3 -
Omartificial-Intelligence-Space/Fanar-Math-R1-GRPO
Text Generation • Updated • 17 • 3 -
Omartificial-Intelligence-Space/Diraya-3B-Instruct-Ar
Text Generation • Updated • 8 • 3 -
Omartificial-Intelligence-Space/Arabic-DeepSeek-R1-Distill-8B
Text Generation • Updated • 6 • 4
Arabic NLI & Semantic Similarity Datasets
The Arabic Version of SNLI and MultiNLI datasets, originally used for Natural Language Inference (NLI), may be used for finetuning embedding models.
-
Omartificial-Intelligence-Space/Arabic-NLi-Pair-Score
Viewer • Updated • 981k • 12 • 3 -
Omartificial-Intelligence-Space/Arabic-NLi-Pair
Viewer • Updated • 328k • 32 • 4 -
Omartificial-Intelligence-Space/Arabic-NLi-Pair-Class
Viewer • Updated • 981k • 243 • 2 -
Omartificial-Intelligence-Space/Arabic-Quora-Duplicates
Viewer • Updated • 149k • 26 • 2
AraEuroBERT
Ara-EuroBERT is a collection of Arabic Semantic Embeddings built on EuroBERT, delivering adaptive embeddings with ultra-long context.
ArabianLLM Series
native Arabian Pretrained GPT-2 models with different sizes (0.1B, 0.3B, 0.8B) trained on 20B+ Arabic tokens
- Runtime errorAgents4
ArabianGPT GroundPlay
📊4Generate text based on input using ArabianGPT models
-
ArabianGPT: Native Arabic GPT-based Large Language Model
Paper • 2402.15313 • Published • 3 -
riotu-lab/ArabianGPT-01B
Text Generation • Updated • 252 • 13 -
riotu-lab/ArabianGPT-08B-V2
Text Generation • 0.8B • Updated • 14
Huggingface FineWeb2 Arabic Dataset Portions
Collection of a comprehensive dataset of Arabic text sourced from the FineWeb2 project, representing diverse content across Arabic MSA and Dialect.
-
HuggingFaceFW/fineweb-2
Viewer • Updated • 4.48B • 119k • 784 -
Omartificial-Intelligence-Space/FineWeb2-MSA
Viewer • Updated • 907M • 120 • 2 -
Omartificial-Intelligence-Space/FineWeb2-Egyptian-Arabic
Viewer • Updated • 23.9M • 27 • 2 -
Omartificial-Intelligence-Space/FineWeb2-Moroccan-Arabic
Viewer • Updated • 69.6M • 45 • 3
Arabic Semantic Embeddings
Find Details for all models here: [https://www.omarai.me/embeddings]
- RunningAgents4
Qwen Arabic Semantic Suite
⚡4Process and analyze Arabic texts for similarity, classification, and more
-
Omartificial-Intelligence-Space/mmbert-base-arabic-nli
Sentence Similarity • 0.3B • Updated • 141 • 1 -
Omartificial-Intelligence-Space/AraGemma-Embedding-300m
Sentence Similarity • 0.3B • Updated • 120 • 14 -
Omartificial-Intelligence-Space/GATE-AraBert-v1
Feature Extraction • 0.1B • Updated • 1.64k • • 17
SHAMIYAT: A Collection of Syrian Dialect Datasets & LLMs
A collection of datasets and language models focused on the Syrian dialect, supporting NLP research and applications for Syria
-
SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
Paper • 2508.02268 • Published • 3 - RunningAgents
SHAMI MT App
🌍Translate Arabic between MSA and Syrian dialect
-
Omartificial-Intelligence-Space/Shami-MT
Translation • 0.4B • Updated • 55 • 1 -
Omartificial-Intelligence-Space/SHAMI-MT-2MSA
Translation • 0.4B • Updated • 21 • 1
Arabic Matryoshka & GATE Embedding Models
A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face
- Runtime errorAgents4
Matroyshka Eval Retrieval Ar
🌍4 -
GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training
Paper • 2505.24581 • Published • 2 -
Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning
Paper • 2407.21139 • Published • 7 -
Omartificial-Intelligence-Space/GATE-AraBert-v1
Feature Extraction • 0.1B • Updated • 1.64k • • 17
Arabic Re-Ranking Hub
A comprehensive collection of datasets, models, and benchmarks for advancing Arabic Re-ranking systems.
- Runtime errorAgents3
Arabic Reranking Eval
🔥3Evaluate Arabic reranking models with insights
-
Omartificial-Intelligence-Space/ARA-Reranker-V1
Text Ranking • 0.6B • Updated • 1.71k • 4 -
NAMAA-Space/GATE-Reranker-V1
Text Ranking • 0.1B • Updated • 705 • 10 -
NAMAA-Space/Namaa-Reranker-v1
Text Ranking • 0.1B • Updated • 3 • 1
Arabic ModernBERT
This collection highlights efforts to enhance Arabic NLP tasks using the latest ModernBERT models.
Arabic LLAMA3 & 3.1 FineTuned Models
-
Omartificial-Intelligence-Space/Arabic-llama3.1-lora-FT
Text Generation • Updated • 7 • 11 -
Omartificial-Intelligence-Space/Arabic-llama3.1-16bit-FT
Text Generation • 8B • Updated • 86 • • 4 -
Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b
Text Generation • 8B • Updated • 8 • 1 -
Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b-GGUF
Text Generation • 8B • Updated • 2
Arab-culture-aligned Multimodal Embedding Models & Datasets
Where Visual Document Retrieval Goes Arabic
Arabic Semantic Embeddings
Find Details for all models here: [https://www.omarai.me/embeddings]
- RunningAgents4
Qwen Arabic Semantic Suite
⚡4Process and analyze Arabic texts for similarity, classification, and more
-
Omartificial-Intelligence-Space/mmbert-base-arabic-nli
Sentence Similarity • 0.3B • Updated • 141 • 1 -
Omartificial-Intelligence-Space/AraGemma-Embedding-300m
Sentence Similarity • 0.3B • Updated • 120 • 14 -
Omartificial-Intelligence-Space/GATE-AraBert-v1
Feature Extraction • 0.1B • Updated • 1.64k • • 17
Saudi Dialect Sentence Embedding Models Collection
Here is a collection of Saudi Dialect Embedding models with Sentence Embedding, classifiers and test dataset.
-
Omartificial-Intelligence-Space/SA-STS-Embeddings-0.2B
Feature Extraction • 0.2B • Updated • 20 • 1 -
Omartificial-Intelligence-Space/SA-BERT-V1
Fill-Mask • 0.2B • Updated • 15 • 4 -
Omartificial-Intelligence-Space/SaudiDialect-Triplet-21
Viewer • Updated • 2.96k • 9 • 3 -
Omartificial-Intelligence-Space/saudi-dialect-test-samples
Viewer • Updated • 1.28k • 22 • 4
SHAMIYAT: A Collection of Syrian Dialect Datasets & LLMs
A collection of datasets and language models focused on the Syrian dialect, supporting NLP research and applications for Syria
-
SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
Paper • 2508.02268 • Published • 3 - RunningAgents
SHAMI MT App
🌍Translate Arabic between MSA and Syrian dialect
-
Omartificial-Intelligence-Space/Shami-MT
Translation • 0.4B • Updated • 55 • 1 -
Omartificial-Intelligence-Space/SHAMI-MT-2MSA
Translation • 0.4B • Updated • 21 • 1
DIRA – Diraya Arabic Reasoning AI
This is an Arabic Reasoning LLM Collection designed for advanced logical inference and instruction-based reasoning in Arabic via datasets and models.
-
Omartificial-Intelligence-Space/gpt-oss-math-ar
Updated • 2 • 3 -
Omartificial-Intelligence-Space/Fanar-Math-R1-GRPO
Text Generation • Updated • 17 • 3 -
Omartificial-Intelligence-Space/Diraya-3B-Instruct-Ar
Text Generation • Updated • 8 • 3 -
Omartificial-Intelligence-Space/Arabic-DeepSeek-R1-Distill-8B
Text Generation • Updated • 6 • 4
Arabic Matryoshka & GATE Embedding Models
A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face
- Runtime errorAgents4
Matroyshka Eval Retrieval Ar
🌍4 -
GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training
Paper • 2505.24581 • Published • 2 -
Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning
Paper • 2407.21139 • Published • 7 -
Omartificial-Intelligence-Space/GATE-AraBert-v1
Feature Extraction • 0.1B • Updated • 1.64k • • 17
Arabic NLI & Semantic Similarity Datasets
The Arabic Version of SNLI and MultiNLI datasets, originally used for Natural Language Inference (NLI), may be used for finetuning embedding models.
-
Omartificial-Intelligence-Space/Arabic-NLi-Pair-Score
Viewer • Updated • 981k • 12 • 3 -
Omartificial-Intelligence-Space/Arabic-NLi-Pair
Viewer • Updated • 328k • 32 • 4 -
Omartificial-Intelligence-Space/Arabic-NLi-Pair-Class
Viewer • Updated • 981k • 243 • 2 -
Omartificial-Intelligence-Space/Arabic-Quora-Duplicates
Viewer • Updated • 149k • 26 • 2
Arabic Re-Ranking Hub
A comprehensive collection of datasets, models, and benchmarks for advancing Arabic Re-ranking systems.
- Runtime errorAgents3
Arabic Reranking Eval
🔥3Evaluate Arabic reranking models with insights
-
Omartificial-Intelligence-Space/ARA-Reranker-V1
Text Ranking • 0.6B • Updated • 1.71k • 4 -
NAMAA-Space/GATE-Reranker-V1
Text Ranking • 0.1B • Updated • 705 • 10 -
NAMAA-Space/Namaa-Reranker-v1
Text Ranking • 0.1B • Updated • 3 • 1
AraEuroBERT
Ara-EuroBERT is a collection of Arabic Semantic Embeddings built on EuroBERT, delivering adaptive embeddings with ultra-long context.
Arabic ModernBERT
This collection highlights efforts to enhance Arabic NLP tasks using the latest ModernBERT models.
ArabianLLM Series
native Arabian Pretrained GPT-2 models with different sizes (0.1B, 0.3B, 0.8B) trained on 20B+ Arabic tokens
- Runtime errorAgents4
ArabianGPT GroundPlay
📊4Generate text based on input using ArabianGPT models
-
ArabianGPT: Native Arabic GPT-based Large Language Model
Paper • 2402.15313 • Published • 3 -
riotu-lab/ArabianGPT-01B
Text Generation • Updated • 252 • 13 -
riotu-lab/ArabianGPT-08B-V2
Text Generation • 0.8B • Updated • 14
Arabic LLAMA3 & 3.1 FineTuned Models
-
Omartificial-Intelligence-Space/Arabic-llama3.1-lora-FT
Text Generation • Updated • 7 • 11 -
Omartificial-Intelligence-Space/Arabic-llama3.1-16bit-FT
Text Generation • 8B • Updated • 86 • • 4 -
Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b
Text Generation • 8B • Updated • 8 • 1 -
Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b-GGUF
Text Generation • 8B • Updated • 2
Huggingface FineWeb2 Arabic Dataset Portions
Collection of a comprehensive dataset of Arabic text sourced from the FineWeb2 project, representing diverse content across Arabic MSA and Dialect.
-
HuggingFaceFW/fineweb-2
Viewer • Updated • 4.48B • 119k • 784 -
Omartificial-Intelligence-Space/FineWeb2-MSA
Viewer • Updated • 907M • 120 • 2 -
Omartificial-Intelligence-Space/FineWeb2-Egyptian-Arabic
Viewer • Updated • 23.9M • 27 • 2 -
Omartificial-Intelligence-Space/FineWeb2-Moroccan-Arabic
Viewer • Updated • 69.6M • 45 • 3