A newer version of the Streamlit SDK is available:
1.52.1
title: HRHUB
emoji: πΌ
colorFrom: green
colorTo: blue
sdk: streamlit
sdk_version: 1.34.0
app_file: app.py
pinned: true
π’ HRHUB - HR Matching System
Bilateral Matching Engine for Candidates & Companies
A professional HR matching system using NLP embeddings and cosine similarity to connect job candidates with relevant companies based on skills, experience, and requirements.
HRHUB solves a fundamental inefficiency in hiring: candidates and companies use different vocabularies when describing skills and requirements. Our system bridges this gap using job postings as a translator, enriching company profiles to speak the same "skills language" as candidates.
Key Innovation
- Candidates describe: "Python, Machine Learning, Data Science"
- Companies describe: "Tech company, innovation, growth"
- Job Postings translate: "We need Python, AWS, TensorFlow"
- Result: Accurate matching in the same embedding space βΒ³βΈβ΄
π Features
- β Bilateral Matching: Both candidates and companies get matched recommendations
- β NLP-Powered: Uses sentence transformers for semantic understanding
- β Interactive Visualization: Network graphs showing match connections
- β Scalable: Handles 9,544 candidates Γ 180,000 companies
- β Real-time: Fast similarity computation using cosine similarity
- β Professional UI: Clean Streamlit interface
π Project Structure
hrhub/
βββ app.py # Main Streamlit application
βββ config.py # Configuration settings
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ data/
β βββ mock_data.py # Demo data (MVP)
β βββ data_loader.py # Real data loader (future)
β βββ embeddings/ # Saved embeddings (future)
βββ utils/
β βββ matching.py # Cosine similarity algorithms
β βββ visualization.py # Network graph generation
β βββ display.py # UI components
βββ assets/
βββ style.css # Custom CSS (optional)
π οΈ Installation & Setup
Prerequisites
- Python 3.8+
- pip package manager
- Git
Local Development
- Clone the repository
git clone https://github.com/your-username/hrhub.git
cd hrhub
- Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Run the app
streamlit run app.py
- Open browser
Navigate to
http://localhost:8501
π Deployment (Streamlit Cloud)
Step 1: Push to GitHub
git add .
git commit -m "Initial commit"
git push origin main
Step 2: Deploy on Streamlit Cloud
- Go to share.streamlit.io
- Sign in with GitHub
- Click "New app"
- Select your repository:
hrhub - Main file path:
app.py - Click "Deploy"
That's it! Your app will be live at https://your-app.streamlit.app
π Data Pipeline
Current (MVP - Hardcoded)
mock_data.py β app.py β Display
Future (Production)
CSV Files β Data Processing β Embeddings β Saved Files
β
app.py loads embeddings β Real-time matching
Files to Generate (Next Phase)
# After running your main code, save these:
1. candidate_embeddings.npy # 9,544 Γ 384 array
2. company_embeddings.npy # 180,000 Γ 384 array
3. candidates_processed.pkl # Full candidate data
4. companies_processed.pkl # Full company data
π Switching from Mock to Real Data
Current Code (MVP)
# app.py
from data.mock_data import get_candidate_data, get_company_matches
After Generating Embeddings
# app.py
from data.data_loader import get_candidate_data, get_company_matches
That's it! No other code changes needed. The UI stays the same.
π¨ Configuration
Edit config.py to customize:
# Matching Settings
DEFAULT_TOP_K = 10 # Number of matches to show
MIN_SIMILARITY_SCORE = 0.5 # Minimum score threshold
EMBEDDING_DIMENSION = 384 # Vector dimension
# UI Settings
NETWORK_GRAPH_HEIGHT = 600 # Graph height in pixels
# Demo Mode
DEMO_MODE = True # Set False for production
π Technical Details
Algorithm
- Text Representation: Convert candidate/company data to structured text
- Embedding: Use sentence transformers (
all-MiniLM-L6-v2) - Similarity: Compute cosine similarity between vectors
- Ranking: Sort by similarity score, return top K
Why Cosine Similarity?
- β Scale-invariant: Focuses on direction, not magnitude
- β Profile shape matching: Captures proportional skill distributions
- β Fast computation: Optimized for large-scale matching
- β Proven in NLP: Standard metric for semantic similarity
Performance
- Loading time: < 5 seconds (with pre-computed embeddings)
- Matching speed: < 1 second for 180K companies
- Memory usage: ~500MB (embeddings loaded)
π§ͺ Testing
Test Mock Data
cd hrhub
python data/mock_data.py
Expected output:
β
Candidate: Demo Candidate #0
β
Top 5 matches loaded
β
Graph data: 6 nodes, 5 edges
Test Streamlit App
streamlit run app.py
π― Roadmap
β Phase 1: MVP (Current)
- Basic matching logic
- Streamlit UI
- Network visualization
- Hardcoded demo data
π Phase 2: Production (Next)
- Generate real embeddings
- Load embeddings from files
- Dynamic candidate selection
- Search functionality
π Phase 3: Advanced (Future)
- User authentication
- Company login view
- Weighted matching (different dimensions)
- RAG-powered recommendations
- Email notifications
- Analytics dashboard
π₯ Team
Master's in Business Data Science - Aalborg University
- Roger - Project Lead & Deployment
- Eskil - [Role]
- [Team Member 3] - [Role]
- [Team Member 4] - [Role]
π License
This project is part of an academic course at Aalborg University.
π€ Contributing
This is an academic project. Contributions are welcome after project submission (December 14, 2024).
π§ Contact
For questions or feedback:
- Create an issue on GitHub
- Contact via Moodle course forum
π Acknowledgments
- Sentence Transformers: Hugging Face team
- Streamlit: Amazing framework for data apps
- PyVis: Interactive network visualization
- Course Instructors: For guidance and support
Last Updated: December 2024
Status: π’ Active Development