Instructions to use Hamza66628/multimodal-rag-system with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Hamza66628/multimodal-rag-system with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Hamza66628/multimodal-rag-system") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Multimodal RAG System
This repository contains a complete Multimodal Retrieval-Augmented Generation (RAG) system that combines text and image search with LLM-based answer generation.
System Components
- Text Embeddings: Sentence-BERT (all-MiniLM-L6-v2) - 384 dimensions
- Image Embeddings: CLIP (ViT-B/32) - 512 dimensions
- Vector Database: FAISS indices for efficient similarity search
- LLM: Mistral-7B-Instruct (4-bit quantized)
- Total Vectors: 446 (161 text + 285 images)
Files
text_index.faiss: FAISS index for text embeddingsimage_index.faiss: FAISS index for image embeddingstext_metadata.pkl: Metadata for text chunks (source, page, content)image_metadata.pkl: Metadata for images (source, page, image_id)config.json: System configurationimage_summary.json: Reference summary of images
Usage
See the load cells in the notebook for loading and using this RAG system.
Features
- Semantic text search
- Cross-modal image search (text query → image results)
- Multiple prompting strategies (Standard, Chain-of-Thought, Few-shot, Zero-shot)
- Source attribution and traceability
- Real-time answer generation
- Downloads last month
- 8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support