# 🚀 Deployment Guide for HuggingFace Space with ZeroGPU ## ✅ Pre-Deployment Checklist All code is ready! Here's what's configured: - ✅ Model: `microsoft/Phi-3-mini-4k-instruct` (3.8B params) - ✅ ZeroGPU support: Enabled with `@spaces.GPU` decorator - ✅ Local/Space compatibility: Auto-detects environment - ✅ Usage tracking: 50 requests/day per user - ✅ Requirements: All dependencies listed - ✅ README: Updated with instructions ## 📋 Deployment Steps ### Step 1: Push Code to Your Space ```bash cd /Users/tom/code/cojournalist-data # If not already initialized git init git remote add space https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data # Or if already connected git add . git commit -m "Deploy Phi-3-mini with ZeroGPU and usage tracking" git push space main ``` ### Step 2: Configure Space Hardware 1. Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data` 2. Click **Settings** (⚙️ icon in top right) 3. Scroll to **Hardware** section 4. Select **ZeroGPU** from dropdown 5. Click **Save** 6. Space will restart automatically ### Step 3: Wait for Build The Space will: 1. Install dependencies (~2-3 minutes) 2. Download Phi-3-mini model (~1-2 minutes, 7.6GB) 3. Load model into memory (~30 seconds) 4. Launch Gradio interface **Total build time: ~5-7 minutes** ### Step 4: Test Your Space Once running, test with these queries: 1. **English:** "Who are the parliamentarians from Zurich?" 2. **German:** "Zeige mir aktuelle Abstimmungen zur Klimapolitik" 3. **French:** "Qui sont les parlementaires de Zurich?" 4. **Italian:** "Mostrami i voti recenti sulla politica climatica" ## 🔧 Space Settings Summary ### Hardware - **Type:** ZeroGPU - **Cost:** FREE (included with Team plan) - **GPU:** Nvidia H200 (70GB VRAM) - **Allocation:** Dynamic (only when needed) ### Environment Variables (Optional) If you want to configure anything: - `HF_TOKEN`: Your HuggingFace token (for private models, not needed for Phi-3) ## 📊 Expected Behavior ### First Request - Takes ~5-10 seconds (GPU allocation + inference) - Subsequent requests faster (~2-5 seconds) ### Rate Limiting - 50 requests per day per user IP - Error message shown when limit reached - Resets daily at midnight UTC ### Model Loading - Happens once on Space startup - Cached for subsequent requests - No reload needed between requests ## 🐛 Troubleshooting ### "Model not loading" - Check Space logs for errors - Verify ZeroGPU is selected in Hardware settings - Ensure `spaces>=0.28.0` in requirements.txt ### "Out of memory" - This shouldn't happen with ZeroGPU (70GB VRAM) - If it does, contact HF support ### "Rate limit not working" - Usage tracker uses in-memory storage - Resets on Space restart - IP-based tracking (works in production) ### "Slow inference" - First request allocates GPU (slower) - Subsequent requests use cached allocation - Normal: 2-5 seconds per request ## 💰 Cost Breakdown - **Team Plan:** $20/user/month (you already have this) - **ZeroGPU:** FREE (included) - **Inference:** FREE (no API calls) - **Storage:** FREE (model cached by HF) **Total additional cost: $0/month** 🎉 ## 🔄 Updates & Maintenance To update your Space: ```bash # Make changes to code git add . git commit -m "Update: description of changes" git push space main ``` Space will automatically rebuild and redeploy. ## 📈 Monitoring Usage Check your Space's metrics: 1. Go to Space page 2. Click "Analytics" tab 3. View daily/weekly usage stats ## 🎯 Next Steps After Deployment 1. ✅ Test all 4 languages 2. ✅ Verify tool calling works 3. ✅ Check rate limiting 4. ✅ Monitor performance 5. 🔜 Adjust system prompt if needed 6. 🔜 Fine-tune temperature/max_tokens if needed ## 📞 Support If you encounter issues: - Check Space logs (Settings → Logs) - HuggingFace Discord: https://discord.gg/huggingface - HF Forums: https://discuss.huggingface.co/ --- **You're ready to deploy! 🚀**