Spaces:

Rogersurf
/

hrhub

Running

App Files Files Community

hrhub / README.md

Roger Surf

readme: updated to work on hugging face

1aa53e7 14 days ago

preview code

raw

history blame contribute delete

6.96 kB

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade

metadata

title: HRHUB
emoji: 💼
colorFrom: green
colorTo: blue
sdk: streamlit
sdk_version: 1.34.0
app_file: app.py
pinned: true

🏢 HRHUB - HR Matching System

Bilateral Matching Engine for Candidates & Companies

A professional HR matching system using NLP embeddings and cosine similarity to connect job candidates with relevant companies based on skills, experience, and requirements.

HRHUB solves a fundamental inefficiency in hiring: candidates and companies use different vocabularies when describing skills and requirements. Our system bridges this gap using job postings as a translator, enriching company profiles to speak the same "skills language" as candidates.

Key Innovation

Candidates describe: "Python, Machine Learning, Data Science"
Companies describe: "Tech company, innovation, growth"
Job Postings translate: "We need Python, AWS, TensorFlow"
Result: Accurate matching in the same embedding space ℝ³⁸⁴

🚀 Features

✅ Bilateral Matching: Both candidates and companies get matched recommendations
✅ NLP-Powered: Uses sentence transformers for semantic understanding
✅ Interactive Visualization: Network graphs showing match connections
✅ Scalable: Handles 9,544 candidates × 180,000 companies
✅ Real-time: Fast similarity computation using cosine similarity
✅ Professional UI: Clean Streamlit interface

📁 Project Structure

hrhub/
├── app.py                      # Main Streamlit application
├── config.py                   # Configuration settings
├── requirements.txt            # Python dependencies
├── README.md                   # This file
├── data/
│   ├── mock_data.py           # Demo data (MVP)
│   ├── data_loader.py         # Real data loader (future)
│   └── embeddings/            # Saved embeddings (future)
├── utils/
│   ├── matching.py            # Cosine similarity algorithms
│   ├── visualization.py       # Network graph generation
│   └── display.py             # UI components
└── assets/
    └── style.css              # Custom CSS (optional)

🛠️ Installation & Setup

Prerequisites

Python 3.8+
pip package manager
Git

Local Development

Clone the repository

git clone https://github.com/your-username/hrhub.git
cd hrhub

Create virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Run the app

streamlit run app.py

Open browser Navigate to http://localhost:8501

🌐 Deployment (Streamlit Cloud)

Step 1: Push to GitHub

git add .
git commit -m "Initial commit"
git push origin main

Step 2: Deploy on Streamlit Cloud

Go to share.streamlit.io
Sign in with GitHub
Click "New app"
Select your repository: hrhub
Main file path: app.py
Click "Deploy"

That's it! Your app will be live at https://your-app.streamlit.app

📊 Data Pipeline

Current (MVP - Hardcoded)

mock_data.py → app.py → Display

Future (Production)

CSV Files → Data Processing → Embeddings → Saved Files
                ↓
            app.py loads embeddings → Real-time matching

Files to Generate (Next Phase)

# After running your main code, save these:
1. candidate_embeddings.npy      # 9,544 × 384 array
2. company_embeddings.npy        # 180,000 × 384 array
3. candidates_processed.pkl      # Full candidate data
4. companies_processed.pkl       # Full company data

🔄 Switching from Mock to Real Data

Current Code (MVP)

# app.py
from data.mock_data import get_candidate_data, get_company_matches

After Generating Embeddings

# app.py
from data.data_loader import get_candidate_data, get_company_matches

That's it! No other code changes needed. The UI stays the same.

🎨 Configuration

Edit config.py to customize:

# Matching Settings
DEFAULT_TOP_K = 10              # Number of matches to show
MIN_SIMILARITY_SCORE = 0.5      # Minimum score threshold
EMBEDDING_DIMENSION = 384       # Vector dimension

# UI Settings
NETWORK_GRAPH_HEIGHT = 600      # Graph height in pixels

# Demo Mode
DEMO_MODE = True                # Set False for production

📈 Technical Details

Algorithm

Text Representation: Convert candidate/company data to structured text
Embedding: Use sentence transformers (all-MiniLM-L6-v2)
Similarity: Compute cosine similarity between vectors
Ranking: Sort by similarity score, return top K

Why Cosine Similarity?

✅ Scale-invariant: Focuses on direction, not magnitude
✅ Profile shape matching: Captures proportional skill distributions
✅ Fast computation: Optimized for large-scale matching
✅ Proven in NLP: Standard metric for semantic similarity

Performance

Loading time: < 5 seconds (with pre-computed embeddings)
Matching speed: < 1 second for 180K companies
Memory usage: ~500MB (embeddings loaded)

🧪 Testing

Test Mock Data

cd hrhub
python data/mock_data.py

Expected output:

✅ Candidate: Demo Candidate #0
✅ Top 5 matches loaded
✅ Graph data: 6 nodes, 5 edges

Test Streamlit App

streamlit run app.py

🎯 Roadmap

✅ Phase 1: MVP (Current)

Basic matching logic
Streamlit UI
Network visualization
Hardcoded demo data

🔄 Phase 2: Production (Next)

Generate real embeddings
Load embeddings from files
Dynamic candidate selection
Search functionality

🚀 Phase 3: Advanced (Future)

User authentication
Company login view
Weighted matching (different dimensions)
RAG-powered recommendations
Email notifications
Analytics dashboard

👥 Team

Master's in Business Data Science - Aalborg University

Roger - Project Lead & Deployment
Eskil - [Role]
[Team Member 3] - [Role]
[Team Member 4] - [Role]

📝 License

This project is part of an academic course at Aalborg University.

🤝 Contributing

This is an academic project. Contributions are welcome after project submission (December 14, 2024).

📧 Contact

For questions or feedback:

Create an issue on GitHub
Contact via Moodle course forum

🙏 Acknowledgments

Sentence Transformers: Hugging Face team
Streamlit: Amazing framework for data apps
PyVis: Interactive network visualization
Course Instructors: For guidance and support

Last Updated: December 2024
Status: 🟢 Active Development