hrhub / README.md
Roger Surf
readme: updated to work on hugging face
1aa53e7

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade
metadata
title: HRHUB
emoji: πŸ’Ό
colorFrom: green
colorTo: blue
sdk: streamlit
sdk_version: 1.34.0
app_file: app.py
pinned: true

🏒 HRHUB - HR Matching System

Bilateral Matching Engine for Candidates & Companies

A professional HR matching system using NLP embeddings and cosine similarity to connect job candidates with relevant companies based on skills, experience, and requirements.


HRHUB solves a fundamental inefficiency in hiring: candidates and companies use different vocabularies when describing skills and requirements. Our system bridges this gap using job postings as a translator, enriching company profiles to speak the same "skills language" as candidates.

Key Innovation

  • Candidates describe: "Python, Machine Learning, Data Science"
  • Companies describe: "Tech company, innovation, growth"
  • Job Postings translate: "We need Python, AWS, TensorFlow"
  • Result: Accurate matching in the same embedding space ℝ³⁸⁴

πŸš€ Features

  • βœ… Bilateral Matching: Both candidates and companies get matched recommendations
  • βœ… NLP-Powered: Uses sentence transformers for semantic understanding
  • βœ… Interactive Visualization: Network graphs showing match connections
  • βœ… Scalable: Handles 9,544 candidates Γ— 180,000 companies
  • βœ… Real-time: Fast similarity computation using cosine similarity
  • βœ… Professional UI: Clean Streamlit interface

πŸ“ Project Structure

hrhub/
β”œβ”€β”€ app.py                      # Main Streamlit application
β”œβ”€β”€ config.py                   # Configuration settings
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ README.md                   # This file
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ mock_data.py           # Demo data (MVP)
β”‚   β”œβ”€β”€ data_loader.py         # Real data loader (future)
β”‚   └── embeddings/            # Saved embeddings (future)
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ matching.py            # Cosine similarity algorithms
β”‚   β”œβ”€β”€ visualization.py       # Network graph generation
β”‚   └── display.py             # UI components
└── assets/
    └── style.css              # Custom CSS (optional)

πŸ› οΈ Installation & Setup

Prerequisites

  • Python 3.8+
  • pip package manager
  • Git

Local Development

  1. Clone the repository
git clone https://github.com/your-username/hrhub.git
cd hrhub
  1. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Run the app
streamlit run app.py
  1. Open browser Navigate to http://localhost:8501

🌐 Deployment (Streamlit Cloud)

Step 1: Push to GitHub

git add .
git commit -m "Initial commit"
git push origin main

Step 2: Deploy on Streamlit Cloud

  1. Go to share.streamlit.io
  2. Sign in with GitHub
  3. Click "New app"
  4. Select your repository: hrhub
  5. Main file path: app.py
  6. Click "Deploy"

That's it! Your app will be live at https://your-app.streamlit.app


πŸ“Š Data Pipeline

Current (MVP - Hardcoded)

mock_data.py β†’ app.py β†’ Display

Future (Production)

CSV Files β†’ Data Processing β†’ Embeddings β†’ Saved Files
                ↓
            app.py loads embeddings β†’ Real-time matching

Files to Generate (Next Phase)

# After running your main code, save these:
1. candidate_embeddings.npy      # 9,544 Γ— 384 array
2. company_embeddings.npy        # 180,000 Γ— 384 array
3. candidates_processed.pkl      # Full candidate data
4. companies_processed.pkl       # Full company data

πŸ”„ Switching from Mock to Real Data

Current Code (MVP)

# app.py
from data.mock_data import get_candidate_data, get_company_matches

After Generating Embeddings

# app.py
from data.data_loader import get_candidate_data, get_company_matches

That's it! No other code changes needed. The UI stays the same.


🎨 Configuration

Edit config.py to customize:

# Matching Settings
DEFAULT_TOP_K = 10              # Number of matches to show
MIN_SIMILARITY_SCORE = 0.5      # Minimum score threshold
EMBEDDING_DIMENSION = 384       # Vector dimension

# UI Settings
NETWORK_GRAPH_HEIGHT = 600      # Graph height in pixels

# Demo Mode
DEMO_MODE = True                # Set False for production

πŸ“ˆ Technical Details

Algorithm

  1. Text Representation: Convert candidate/company data to structured text
  2. Embedding: Use sentence transformers (all-MiniLM-L6-v2)
  3. Similarity: Compute cosine similarity between vectors
  4. Ranking: Sort by similarity score, return top K

Why Cosine Similarity?

  • βœ… Scale-invariant: Focuses on direction, not magnitude
  • βœ… Profile shape matching: Captures proportional skill distributions
  • βœ… Fast computation: Optimized for large-scale matching
  • βœ… Proven in NLP: Standard metric for semantic similarity

Performance

  • Loading time: < 5 seconds (with pre-computed embeddings)
  • Matching speed: < 1 second for 180K companies
  • Memory usage: ~500MB (embeddings loaded)

πŸ§ͺ Testing

Test Mock Data

cd hrhub
python data/mock_data.py

Expected output:

βœ… Candidate: Demo Candidate #0
βœ… Top 5 matches loaded
βœ… Graph data: 6 nodes, 5 edges

Test Streamlit App

streamlit run app.py

🎯 Roadmap

βœ… Phase 1: MVP (Current)

  • Basic matching logic
  • Streamlit UI
  • Network visualization
  • Hardcoded demo data

πŸ”„ Phase 2: Production (Next)

  • Generate real embeddings
  • Load embeddings from files
  • Dynamic candidate selection
  • Search functionality

πŸš€ Phase 3: Advanced (Future)

  • User authentication
  • Company login view
  • Weighted matching (different dimensions)
  • RAG-powered recommendations
  • Email notifications
  • Analytics dashboard

πŸ‘₯ Team

Master's in Business Data Science - Aalborg University

  • Roger - Project Lead & Deployment
  • Eskil - [Role]
  • [Team Member 3] - [Role]
  • [Team Member 4] - [Role]

πŸ“ License

This project is part of an academic course at Aalborg University.


🀝 Contributing

This is an academic project. Contributions are welcome after project submission (December 14, 2024).


πŸ“§ Contact

For questions or feedback:

  • Create an issue on GitHub
  • Contact via Moodle course forum

πŸ™ Acknowledgments

  • Sentence Transformers: Hugging Face team
  • Streamlit: Amazing framework for data apps
  • PyVis: Interactive network visualization
  • Course Instructors: For guidance and support

Last Updated: December 2024
Status: 🟒 Active Development