hrhub / README.md
Roger Surf
readme: updated to work on hugging face
1aa53e7
---
title: HRHUB
emoji: πŸ’Ό
colorFrom: green
colorTo: blue
sdk: streamlit
sdk_version: "1.34.0"
app_file: app.py
pinned: true
---
# 🏒 HRHUB - HR Matching System
**Bilateral Matching Engine for Candidates & Companies**
A professional HR matching system using NLP embeddings and cosine similarity to connect job candidates with relevant companies based on skills, experience, and requirements.
---
HRHUB solves a fundamental inefficiency in hiring: candidates and companies use different vocabularies when describing skills and requirements. Our system bridges this gap using **job postings** as a translator, enriching company profiles to speak the same "skills language" as candidates.
### Key Innovation
- **Candidates** describe: "Python, Machine Learning, Data Science"
- **Companies** describe: "Tech company, innovation, growth"
- **Job Postings** translate: "We need Python, AWS, TensorFlow"
- **Result**: Accurate matching in the same embedding space ℝ³⁸⁴
---
## πŸš€ Features
- βœ… **Bilateral Matching**: Both candidates and companies get matched recommendations
- βœ… **NLP-Powered**: Uses sentence transformers for semantic understanding
- βœ… **Interactive Visualization**: Network graphs showing match connections
- βœ… **Scalable**: Handles 9,544 candidates Γ— 180,000 companies
- βœ… **Real-time**: Fast similarity computation using cosine similarity
- βœ… **Professional UI**: Clean Streamlit interface
---
## πŸ“ Project Structure
```
hrhub/
β”œβ”€β”€ app.py # Main Streamlit application
β”œβ”€β”€ config.py # Configuration settings
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ mock_data.py # Demo data (MVP)
β”‚ β”œβ”€β”€ data_loader.py # Real data loader (future)
β”‚ └── embeddings/ # Saved embeddings (future)
β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ matching.py # Cosine similarity algorithms
β”‚ β”œβ”€β”€ visualization.py # Network graph generation
β”‚ └── display.py # UI components
└── assets/
└── style.css # Custom CSS (optional)
```
---
## πŸ› οΈ Installation & Setup
### Prerequisites
- Python 3.8+
- pip package manager
- Git
### Local Development
1. **Clone the repository**
```bash
git clone https://github.com/your-username/hrhub.git
cd hrhub
```
2. **Create virtual environment** (recommended)
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install dependencies**
```bash
pip install -r requirements.txt
```
4. **Run the app**
```bash
streamlit run app.py
```
5. **Open browser**
Navigate to `http://localhost:8501`
---
## 🌐 Deployment (Streamlit Cloud)
### Step 1: Push to GitHub
```bash
git add .
git commit -m "Initial commit"
git push origin main
```
### Step 2: Deploy on Streamlit Cloud
1. Go to [share.streamlit.io](https://share.streamlit.io)
2. Sign in with GitHub
3. Click "New app"
4. Select your repository: `hrhub`
5. Main file path: `app.py`
6. Click "Deploy"
**That's it!** Your app will be live at `https://your-app.streamlit.app`
---
## πŸ“Š Data Pipeline
### Current (MVP - Hardcoded)
```
mock_data.py β†’ app.py β†’ Display
```
### Future (Production)
```
CSV Files β†’ Data Processing β†’ Embeddings β†’ Saved Files
↓
app.py loads embeddings β†’ Real-time matching
```
### Files to Generate (Next Phase)
```python
# After running your main code, save these:
1. candidate_embeddings.npy # 9,544 Γ— 384 array
2. company_embeddings.npy # 180,000 Γ— 384 array
3. candidates_processed.pkl # Full candidate data
4. companies_processed.pkl # Full company data
```
---
## πŸ”„ Switching from Mock to Real Data
### Current Code (MVP)
```python
# app.py
from data.mock_data import get_candidate_data, get_company_matches
```
### After Generating Embeddings
```python
# app.py
from data.data_loader import get_candidate_data, get_company_matches
```
**That's it!** No other code changes needed. The UI stays the same.
---
## 🎨 Configuration
Edit `config.py` to customize:
```python
# Matching Settings
DEFAULT_TOP_K = 10 # Number of matches to show
MIN_SIMILARITY_SCORE = 0.5 # Minimum score threshold
EMBEDDING_DIMENSION = 384 # Vector dimension
# UI Settings
NETWORK_GRAPH_HEIGHT = 600 # Graph height in pixels
# Demo Mode
DEMO_MODE = True # Set False for production
```
---
## πŸ“ˆ Technical Details
### Algorithm
1. **Text Representation**: Convert candidate/company data to structured text
2. **Embedding**: Use sentence transformers (`all-MiniLM-L6-v2`)
3. **Similarity**: Compute cosine similarity between vectors
4. **Ranking**: Sort by similarity score, return top K
### Why Cosine Similarity?
- βœ… **Scale-invariant**: Focuses on direction, not magnitude
- βœ… **Profile shape matching**: Captures proportional skill distributions
- βœ… **Fast computation**: Optimized for large-scale matching
- βœ… **Proven in NLP**: Standard metric for semantic similarity
### Performance
- **Loading time**: < 5 seconds (with pre-computed embeddings)
- **Matching speed**: < 1 second for 180K companies
- **Memory usage**: ~500MB (embeddings loaded)
---
## πŸ§ͺ Testing
### Test Mock Data
```bash
cd hrhub
python data/mock_data.py
```
Expected output:
```
βœ… Candidate: Demo Candidate #0
βœ… Top 5 matches loaded
βœ… Graph data: 6 nodes, 5 edges
```
### Test Streamlit App
```bash
streamlit run app.py
```
---
## 🎯 Roadmap
### βœ… Phase 1: MVP (Current)
- [x] Basic matching logic
- [x] Streamlit UI
- [x] Network visualization
- [x] Hardcoded demo data
### πŸ”„ Phase 2: Production (Next)
- [ ] Generate real embeddings
- [ ] Load embeddings from files
- [ ] Dynamic candidate selection
- [ ] Search functionality
### πŸš€ Phase 3: Advanced (Future)
- [ ] User authentication
- [ ] Company login view
- [ ] Weighted matching (different dimensions)
- [ ] RAG-powered recommendations
- [ ] Email notifications
- [ ] Analytics dashboard
---
## πŸ‘₯ Team
**Master's in Business Data Science - Aalborg University**
- Roger - Project Lead & Deployment
- Eskil - [Role]
- [Team Member 3] - [Role]
- [Team Member 4] - [Role]
---
## πŸ“ License
This project is part of an academic course at Aalborg University.
---
## 🀝 Contributing
This is an academic project. Contributions are welcome after project submission (December 14, 2024).
---
## πŸ“§ Contact
For questions or feedback:
- Create an issue on GitHub
- Contact via Moodle course forum
---
## πŸ™ Acknowledgments
- **Sentence Transformers**: Hugging Face team
- **Streamlit**: Amazing framework for data apps
- **PyVis**: Interactive network visualization
- **Course Instructors**: For guidance and support
---
**Last Updated**: December 2024
**Status**: 🟒 Active Development