|
|
--- |
|
|
title: HRHUB |
|
|
emoji: πΌ |
|
|
colorFrom: green |
|
|
colorTo: blue |
|
|
sdk: streamlit |
|
|
sdk_version: "1.34.0" |
|
|
app_file: app.py |
|
|
pinned: true |
|
|
--- |
|
|
|
|
|
# π’ HRHUB - HR Matching System |
|
|
|
|
|
**Bilateral Matching Engine for Candidates & Companies** |
|
|
|
|
|
A professional HR matching system using NLP embeddings and cosine similarity to connect job candidates with relevant companies based on skills, experience, and requirements. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
HRHUB solves a fundamental inefficiency in hiring: candidates and companies use different vocabularies when describing skills and requirements. Our system bridges this gap using **job postings** as a translator, enriching company profiles to speak the same "skills language" as candidates. |
|
|
|
|
|
### Key Innovation |
|
|
- **Candidates** describe: "Python, Machine Learning, Data Science" |
|
|
- **Companies** describe: "Tech company, innovation, growth" |
|
|
- **Job Postings** translate: "We need Python, AWS, TensorFlow" |
|
|
- **Result**: Accurate matching in the same embedding space βΒ³βΈβ΄ |
|
|
|
|
|
--- |
|
|
|
|
|
## π Features |
|
|
|
|
|
- β
**Bilateral Matching**: Both candidates and companies get matched recommendations |
|
|
- β
**NLP-Powered**: Uses sentence transformers for semantic understanding |
|
|
- β
**Interactive Visualization**: Network graphs showing match connections |
|
|
- β
**Scalable**: Handles 9,544 candidates Γ 180,000 companies |
|
|
- β
**Real-time**: Fast similarity computation using cosine similarity |
|
|
- β
**Professional UI**: Clean Streamlit interface |
|
|
|
|
|
--- |
|
|
|
|
|
## π Project Structure |
|
|
|
|
|
``` |
|
|
hrhub/ |
|
|
βββ app.py # Main Streamlit application |
|
|
βββ config.py # Configuration settings |
|
|
βββ requirements.txt # Python dependencies |
|
|
βββ README.md # This file |
|
|
βββ data/ |
|
|
β βββ mock_data.py # Demo data (MVP) |
|
|
β βββ data_loader.py # Real data loader (future) |
|
|
β βββ embeddings/ # Saved embeddings (future) |
|
|
βββ utils/ |
|
|
β βββ matching.py # Cosine similarity algorithms |
|
|
β βββ visualization.py # Network graph generation |
|
|
β βββ display.py # UI components |
|
|
βββ assets/ |
|
|
βββ style.css # Custom CSS (optional) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π οΈ Installation & Setup |
|
|
|
|
|
### Prerequisites |
|
|
- Python 3.8+ |
|
|
- pip package manager |
|
|
- Git |
|
|
|
|
|
### Local Development |
|
|
|
|
|
1. **Clone the repository** |
|
|
```bash |
|
|
git clone https://github.com/your-username/hrhub.git |
|
|
cd hrhub |
|
|
``` |
|
|
|
|
|
2. **Create virtual environment** (recommended) |
|
|
```bash |
|
|
python -m venv venv |
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
|
``` |
|
|
|
|
|
3. **Install dependencies** |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
4. **Run the app** |
|
|
```bash |
|
|
streamlit run app.py |
|
|
``` |
|
|
|
|
|
5. **Open browser** |
|
|
Navigate to `http://localhost:8501` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Deployment (Streamlit Cloud) |
|
|
|
|
|
### Step 1: Push to GitHub |
|
|
```bash |
|
|
git add . |
|
|
git commit -m "Initial commit" |
|
|
git push origin main |
|
|
``` |
|
|
|
|
|
### Step 2: Deploy on Streamlit Cloud |
|
|
1. Go to [share.streamlit.io](https://share.streamlit.io) |
|
|
2. Sign in with GitHub |
|
|
3. Click "New app" |
|
|
4. Select your repository: `hrhub` |
|
|
5. Main file path: `app.py` |
|
|
6. Click "Deploy" |
|
|
|
|
|
**That's it!** Your app will be live at `https://your-app.streamlit.app` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Data Pipeline |
|
|
|
|
|
### Current (MVP - Hardcoded) |
|
|
``` |
|
|
mock_data.py β app.py β Display |
|
|
``` |
|
|
|
|
|
### Future (Production) |
|
|
``` |
|
|
CSV Files β Data Processing β Embeddings β Saved Files |
|
|
β |
|
|
app.py loads embeddings β Real-time matching |
|
|
``` |
|
|
|
|
|
### Files to Generate (Next Phase) |
|
|
```python |
|
|
# After running your main code, save these: |
|
|
1. candidate_embeddings.npy # 9,544 Γ 384 array |
|
|
2. company_embeddings.npy # 180,000 Γ 384 array |
|
|
3. candidates_processed.pkl # Full candidate data |
|
|
4. companies_processed.pkl # Full company data |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Switching from Mock to Real Data |
|
|
|
|
|
### Current Code (MVP) |
|
|
```python |
|
|
# app.py |
|
|
from data.mock_data import get_candidate_data, get_company_matches |
|
|
``` |
|
|
|
|
|
### After Generating Embeddings |
|
|
```python |
|
|
# app.py |
|
|
from data.data_loader import get_candidate_data, get_company_matches |
|
|
``` |
|
|
|
|
|
**That's it!** No other code changes needed. The UI stays the same. |
|
|
|
|
|
--- |
|
|
|
|
|
## π¨ Configuration |
|
|
|
|
|
Edit `config.py` to customize: |
|
|
|
|
|
```python |
|
|
# Matching Settings |
|
|
DEFAULT_TOP_K = 10 # Number of matches to show |
|
|
MIN_SIMILARITY_SCORE = 0.5 # Minimum score threshold |
|
|
EMBEDDING_DIMENSION = 384 # Vector dimension |
|
|
|
|
|
# UI Settings |
|
|
NETWORK_GRAPH_HEIGHT = 600 # Graph height in pixels |
|
|
|
|
|
# Demo Mode |
|
|
DEMO_MODE = True # Set False for production |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Technical Details |
|
|
|
|
|
### Algorithm |
|
|
1. **Text Representation**: Convert candidate/company data to structured text |
|
|
2. **Embedding**: Use sentence transformers (`all-MiniLM-L6-v2`) |
|
|
3. **Similarity**: Compute cosine similarity between vectors |
|
|
4. **Ranking**: Sort by similarity score, return top K |
|
|
|
|
|
### Why Cosine Similarity? |
|
|
- β
**Scale-invariant**: Focuses on direction, not magnitude |
|
|
- β
**Profile shape matching**: Captures proportional skill distributions |
|
|
- β
**Fast computation**: Optimized for large-scale matching |
|
|
- β
**Proven in NLP**: Standard metric for semantic similarity |
|
|
|
|
|
### Performance |
|
|
- **Loading time**: < 5 seconds (with pre-computed embeddings) |
|
|
- **Matching speed**: < 1 second for 180K companies |
|
|
- **Memory usage**: ~500MB (embeddings loaded) |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ͺ Testing |
|
|
|
|
|
### Test Mock Data |
|
|
```bash |
|
|
cd hrhub |
|
|
python data/mock_data.py |
|
|
``` |
|
|
|
|
|
Expected output: |
|
|
``` |
|
|
β
Candidate: Demo Candidate #0 |
|
|
β
Top 5 matches loaded |
|
|
β
Graph data: 6 nodes, 5 edges |
|
|
``` |
|
|
|
|
|
### Test Streamlit App |
|
|
```bash |
|
|
streamlit run app.py |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π― Roadmap |
|
|
|
|
|
### β
Phase 1: MVP (Current) |
|
|
- [x] Basic matching logic |
|
|
- [x] Streamlit UI |
|
|
- [x] Network visualization |
|
|
- [x] Hardcoded demo data |
|
|
|
|
|
### π Phase 2: Production (Next) |
|
|
- [ ] Generate real embeddings |
|
|
- [ ] Load embeddings from files |
|
|
- [ ] Dynamic candidate selection |
|
|
- [ ] Search functionality |
|
|
|
|
|
### π Phase 3: Advanced (Future) |
|
|
- [ ] User authentication |
|
|
- [ ] Company login view |
|
|
- [ ] Weighted matching (different dimensions) |
|
|
- [ ] RAG-powered recommendations |
|
|
- [ ] Email notifications |
|
|
- [ ] Analytics dashboard |
|
|
|
|
|
--- |
|
|
|
|
|
## π₯ Team |
|
|
|
|
|
**Master's in Business Data Science - Aalborg University** |
|
|
|
|
|
- Roger - Project Lead & Deployment |
|
|
- Eskil - [Role] |
|
|
- [Team Member 3] - [Role] |
|
|
- [Team Member 4] - [Role] |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is part of an academic course at Aalborg University. |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
This is an academic project. Contributions are welcome after project submission (December 14, 2024). |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Contact |
|
|
|
|
|
For questions or feedback: |
|
|
- Create an issue on GitHub |
|
|
- Contact via Moodle course forum |
|
|
|
|
|
--- |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- **Sentence Transformers**: Hugging Face team |
|
|
- **Streamlit**: Amazing framework for data apps |
|
|
- **PyVis**: Interactive network visualization |
|
|
- **Course Instructors**: For guidance and support |
|
|
|
|
|
--- |
|
|
|
|
|
**Last Updated**: December 2024 |
|
|
**Status**: π’ Active Development |
|
|
|