--- title: HRHUB emoji: πŸ’Ό colorFrom: green colorTo: blue sdk: streamlit sdk_version: "1.34.0" app_file: app.py pinned: true --- # 🏒 HRHUB - HR Matching System **Bilateral Matching Engine for Candidates & Companies** A professional HR matching system using NLP embeddings and cosine similarity to connect job candidates with relevant companies based on skills, experience, and requirements. --- HRHUB solves a fundamental inefficiency in hiring: candidates and companies use different vocabularies when describing skills and requirements. Our system bridges this gap using **job postings** as a translator, enriching company profiles to speak the same "skills language" as candidates. ### Key Innovation - **Candidates** describe: "Python, Machine Learning, Data Science" - **Companies** describe: "Tech company, innovation, growth" - **Job Postings** translate: "We need Python, AWS, TensorFlow" - **Result**: Accurate matching in the same embedding space ℝ³⁸⁴ --- ## πŸš€ Features - βœ… **Bilateral Matching**: Both candidates and companies get matched recommendations - βœ… **NLP-Powered**: Uses sentence transformers for semantic understanding - βœ… **Interactive Visualization**: Network graphs showing match connections - βœ… **Scalable**: Handles 9,544 candidates Γ— 180,000 companies - βœ… **Real-time**: Fast similarity computation using cosine similarity - βœ… **Professional UI**: Clean Streamlit interface --- ## πŸ“ Project Structure ``` hrhub/ β”œβ”€β”€ app.py # Main Streamlit application β”œβ”€β”€ config.py # Configuration settings β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ README.md # This file β”œβ”€β”€ data/ β”‚ β”œβ”€β”€ mock_data.py # Demo data (MVP) β”‚ β”œβ”€β”€ data_loader.py # Real data loader (future) β”‚ └── embeddings/ # Saved embeddings (future) β”œβ”€β”€ utils/ β”‚ β”œβ”€β”€ matching.py # Cosine similarity algorithms β”‚ β”œβ”€β”€ visualization.py # Network graph generation β”‚ └── display.py # UI components └── assets/ └── style.css # Custom CSS (optional) ``` --- ## πŸ› οΈ Installation & Setup ### Prerequisites - Python 3.8+ - pip package manager - Git ### Local Development 1. **Clone the repository** ```bash git clone https://github.com/your-username/hrhub.git cd hrhub ``` 2. **Create virtual environment** (recommended) ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. **Install dependencies** ```bash pip install -r requirements.txt ``` 4. **Run the app** ```bash streamlit run app.py ``` 5. **Open browser** Navigate to `http://localhost:8501` --- ## 🌐 Deployment (Streamlit Cloud) ### Step 1: Push to GitHub ```bash git add . git commit -m "Initial commit" git push origin main ``` ### Step 2: Deploy on Streamlit Cloud 1. Go to [share.streamlit.io](https://share.streamlit.io) 2. Sign in with GitHub 3. Click "New app" 4. Select your repository: `hrhub` 5. Main file path: `app.py` 6. Click "Deploy" **That's it!** Your app will be live at `https://your-app.streamlit.app` --- ## πŸ“Š Data Pipeline ### Current (MVP - Hardcoded) ``` mock_data.py β†’ app.py β†’ Display ``` ### Future (Production) ``` CSV Files β†’ Data Processing β†’ Embeddings β†’ Saved Files ↓ app.py loads embeddings β†’ Real-time matching ``` ### Files to Generate (Next Phase) ```python # After running your main code, save these: 1. candidate_embeddings.npy # 9,544 Γ— 384 array 2. company_embeddings.npy # 180,000 Γ— 384 array 3. candidates_processed.pkl # Full candidate data 4. companies_processed.pkl # Full company data ``` --- ## πŸ”„ Switching from Mock to Real Data ### Current Code (MVP) ```python # app.py from data.mock_data import get_candidate_data, get_company_matches ``` ### After Generating Embeddings ```python # app.py from data.data_loader import get_candidate_data, get_company_matches ``` **That's it!** No other code changes needed. The UI stays the same. --- ## 🎨 Configuration Edit `config.py` to customize: ```python # Matching Settings DEFAULT_TOP_K = 10 # Number of matches to show MIN_SIMILARITY_SCORE = 0.5 # Minimum score threshold EMBEDDING_DIMENSION = 384 # Vector dimension # UI Settings NETWORK_GRAPH_HEIGHT = 600 # Graph height in pixels # Demo Mode DEMO_MODE = True # Set False for production ``` --- ## πŸ“ˆ Technical Details ### Algorithm 1. **Text Representation**: Convert candidate/company data to structured text 2. **Embedding**: Use sentence transformers (`all-MiniLM-L6-v2`) 3. **Similarity**: Compute cosine similarity between vectors 4. **Ranking**: Sort by similarity score, return top K ### Why Cosine Similarity? - βœ… **Scale-invariant**: Focuses on direction, not magnitude - βœ… **Profile shape matching**: Captures proportional skill distributions - βœ… **Fast computation**: Optimized for large-scale matching - βœ… **Proven in NLP**: Standard metric for semantic similarity ### Performance - **Loading time**: < 5 seconds (with pre-computed embeddings) - **Matching speed**: < 1 second for 180K companies - **Memory usage**: ~500MB (embeddings loaded) --- ## πŸ§ͺ Testing ### Test Mock Data ```bash cd hrhub python data/mock_data.py ``` Expected output: ``` βœ… Candidate: Demo Candidate #0 βœ… Top 5 matches loaded βœ… Graph data: 6 nodes, 5 edges ``` ### Test Streamlit App ```bash streamlit run app.py ``` --- ## 🎯 Roadmap ### βœ… Phase 1: MVP (Current) - [x] Basic matching logic - [x] Streamlit UI - [x] Network visualization - [x] Hardcoded demo data ### πŸ”„ Phase 2: Production (Next) - [ ] Generate real embeddings - [ ] Load embeddings from files - [ ] Dynamic candidate selection - [ ] Search functionality ### πŸš€ Phase 3: Advanced (Future) - [ ] User authentication - [ ] Company login view - [ ] Weighted matching (different dimensions) - [ ] RAG-powered recommendations - [ ] Email notifications - [ ] Analytics dashboard --- ## πŸ‘₯ Team **Master's in Business Data Science - Aalborg University** - Roger - Project Lead & Deployment - Eskil - [Role] - [Team Member 3] - [Role] - [Team Member 4] - [Role] --- ## πŸ“ License This project is part of an academic course at Aalborg University. --- ## 🀝 Contributing This is an academic project. Contributions are welcome after project submission (December 14, 2024). --- ## πŸ“§ Contact For questions or feedback: - Create an issue on GitHub - Contact via Moodle course forum --- ## πŸ™ Acknowledgments - **Sentence Transformers**: Hugging Face team - **Streamlit**: Amazing framework for data apps - **PyVis**: Interactive network visualization - **Course Instructors**: For guidance and support --- **Last Updated**: December 2024 **Status**: 🟒 Active Development