Spaces:
Paused
Paused
Commit
·
be36ee7
1
Parent(s):
09a9bd6
Refactor Docker configurations to use `uvicorn` as the entry point for FastAPI applications. Update `.huggingface.yaml` to remove legacy app configuration and clarify hardware requirements. Modify `Dockerfile.prod` to install `uvicorn` and adjust the command for production deployment.
Browse files- .huggingface.yaml +2 -5
- CHANGES_SUMMARY.md +248 -0
- Dockerfile.hf-spaces +2 -2
- Dockerfile.hf-spaces-minimal +52 -0
- HF_SPACES_QUICK_FIX.md +137 -0
- HF_SPACES_SCHEDULING_FIX.md +331 -0
- services/ai-service/DEPLOYMENT_FIX.md +177 -0
- services/ai-service/Dockerfile.prod +3 -2
- switch_hf_config.ps1 +118 -0
- switch_hf_config.sh +114 -0
.huggingface.yaml
CHANGED
|
@@ -7,13 +7,10 @@ build:
|
|
| 7 |
dockerfile: Dockerfile.hf-spaces
|
| 8 |
# Enable Docker layer caching for faster rebuilds
|
| 9 |
cache: true
|
| 10 |
-
|
| 11 |
-
# App configuration
|
| 12 |
-
app:
|
| 13 |
-
entrypoint: services/ai-service/src/ai_med_extract/app:app
|
| 14 |
-
port: 7860
|
| 15 |
|
| 16 |
# Hardware requirements
|
|
|
|
|
|
|
| 17 |
hardware:
|
| 18 |
gpu: t4-medium # 16GB GPU RAM, 16GB System RAM
|
| 19 |
|
|
|
|
| 7 |
dockerfile: Dockerfile.hf-spaces
|
| 8 |
# Enable Docker layer caching for faster rebuilds
|
| 9 |
cache: true
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
# Hardware requirements
|
| 12 |
+
# Note: Remove or comment out if t4-medium is unavailable
|
| 13 |
+
# You can also use: t4-small, cpu-upgrade, or a100-large
|
| 14 |
hardware:
|
| 15 |
gpu: t4-medium # 16GB GPU RAM, 16GB System RAM
|
| 16 |
|
CHANGES_SUMMARY.md
ADDED
|
@@ -0,0 +1,248 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Changes Summary - HF Spaces Scheduling Error Fix
|
| 2 |
+
|
| 3 |
+
## What Was Wrong
|
| 4 |
+
|
| 5 |
+
Your app was failing to deploy on Hugging Face Spaces with:
|
| 6 |
+
- **Error:** "Scheduling failure: unable to schedule"
|
| 7 |
+
- **Cause:** Multiple issues:
|
| 8 |
+
1. Conflicting entry point configuration
|
| 9 |
+
2. Requesting `t4-medium` GPU (often unavailable)
|
| 10 |
+
3. Heavy model preloading (~4.2GB)
|
| 11 |
+
|
| 12 |
+
## What I Fixed
|
| 13 |
+
|
| 14 |
+
### 1. Fixed `.huggingface.yaml`
|
| 15 |
+
**Changed:**
|
| 16 |
+
- ❌ Removed `app.entrypoint: services/ai-service/src/ai_med_extract/app:app`
|
| 17 |
+
- ✅ Docker CMD now takes precedence (cleaner configuration)
|
| 18 |
+
- ✅ Added comments about hardware alternatives
|
| 19 |
+
|
| 20 |
+
**Why:** The `entrypoint` field was conflicting with the Dockerfile's CMD, causing confusion in how HF Spaces should start the app.
|
| 21 |
+
|
| 22 |
+
### 2. Fixed `Dockerfile.hf-spaces`
|
| 23 |
+
**Changed:**
|
| 24 |
+
```dockerfile
|
| 25 |
+
# Before:
|
| 26 |
+
CMD ["uvicorn", "ai_med_extract.app:app", ...]
|
| 27 |
+
|
| 28 |
+
# After:
|
| 29 |
+
CMD ["uvicorn", "app:app", ...]
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
**Why:** The root `app.py` is specifically designed for HF Spaces with proper initialization and error handling.
|
| 33 |
+
|
| 34 |
+
### 3. Created `Dockerfile.hf-spaces-minimal`
|
| 35 |
+
**New file:** Lightweight alternative without model preloading
|
| 36 |
+
- Uses `/tmp` for caching (HF Spaces compatible)
|
| 37 |
+
- Single worker (minimal memory)
|
| 38 |
+
- Fast startup (no model preloading)
|
| 39 |
+
- Only ~2GB RAM needed vs ~16GB
|
| 40 |
+
|
| 41 |
+
### 4. Created Documentation
|
| 42 |
+
- `HF_SPACES_SCHEDULING_FIX.md` - Complete troubleshooting guide
|
| 43 |
+
- `HF_SPACES_QUICK_FIX.md` - Quick reference card
|
| 44 |
+
- `CHANGES_SUMMARY.md` - This file
|
| 45 |
+
|
| 46 |
+
## What You Should Do Now
|
| 47 |
+
|
| 48 |
+
### ⚡ FASTEST FIX (Recommended)
|
| 49 |
+
|
| 50 |
+
1. **Edit `.huggingface.yaml`** - Use this configuration:
|
| 51 |
+
|
| 52 |
+
```yaml
|
| 53 |
+
runtime: docker
|
| 54 |
+
sdk: docker
|
| 55 |
+
python_version: "3.10"
|
| 56 |
+
|
| 57 |
+
build:
|
| 58 |
+
dockerfile: Dockerfile.hf-spaces-minimal
|
| 59 |
+
cache: true
|
| 60 |
+
|
| 61 |
+
# Remove hardware section to use free CPU tier
|
| 62 |
+
|
| 63 |
+
env:
|
| 64 |
+
- HF_SPACES=true
|
| 65 |
+
- FAST_MODE=true
|
| 66 |
+
- PRELOAD_GGUF=false
|
| 67 |
+
- PRELOAD_SMALL_MODELS=false
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
2. **Commit and push:**
|
| 71 |
+
```bash
|
| 72 |
+
git add .
|
| 73 |
+
git commit -m "Fix HF Spaces deployment - use minimal config"
|
| 74 |
+
git push
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
3. **Wait 5-10 minutes** for the build to complete
|
| 78 |
+
|
| 79 |
+
4. **Test your space:**
|
| 80 |
+
```bash
|
| 81 |
+
curl https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE/health
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
### 🎮 Alternative: Keep GPU But Use t4-small
|
| 85 |
+
|
| 86 |
+
If you need GPU and have access:
|
| 87 |
+
|
| 88 |
+
```yaml
|
| 89 |
+
runtime: docker
|
| 90 |
+
sdk: docker
|
| 91 |
+
|
| 92 |
+
build:
|
| 93 |
+
dockerfile: Dockerfile.hf-spaces-minimal
|
| 94 |
+
cache: true
|
| 95 |
+
|
| 96 |
+
hardware:
|
| 97 |
+
gpu: t4-small # More available than t4-medium
|
| 98 |
+
|
| 99 |
+
env:
|
| 100 |
+
- HF_SPACES=true
|
| 101 |
+
- CUDA_VISIBLE_DEVICES=0
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
### 🚀 Advanced: Full Model Preloading (If You Have Pro/Enterprise)
|
| 105 |
+
|
| 106 |
+
Keep the current `Dockerfile.hf-spaces` with full model preloading, but:
|
| 107 |
+
|
| 108 |
+
```yaml
|
| 109 |
+
hardware:
|
| 110 |
+
gpu: t4-medium # Requires Pro/Enterprise tier
|
| 111 |
+
|
| 112 |
+
env:
|
| 113 |
+
- PRELOAD_GGUF=true # Pre-cache models
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
Note: This requires ~20-30 minutes for first build, but subsequent starts are instant.
|
| 117 |
+
|
| 118 |
+
## Files Modified
|
| 119 |
+
|
| 120 |
+
```
|
| 121 |
+
✅ .huggingface.yaml - Fixed configuration
|
| 122 |
+
✅ Dockerfile.hf-spaces - Fixed CMD entry point
|
| 123 |
+
🆕 Dockerfile.hf-spaces-minimal - New lightweight option
|
| 124 |
+
📄 HF_SPACES_SCHEDULING_FIX.md - Complete guide
|
| 125 |
+
📄 HF_SPACES_QUICK_FIX.md - Quick reference
|
| 126 |
+
📄 CHANGES_SUMMARY.md - This summary
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
## Comparison: Minimal vs Full
|
| 130 |
+
|
| 131 |
+
| Feature | Minimal | Full (Original) |
|
| 132 |
+
|---------|---------|-----------------|
|
| 133 |
+
| **Build Time** | 5 min | 20-30 min |
|
| 134 |
+
| **Startup Time** | 30 sec | 1-2 min |
|
| 135 |
+
| **Memory Usage** | 2GB | 8-16GB |
|
| 136 |
+
| **First Request** | 2-3 min (downloads model) | Instant |
|
| 137 |
+
| **Hardware Needed** | CPU or small GPU | t4-medium+ |
|
| 138 |
+
| **Cost** | Free tier OK | Pro/Enterprise |
|
| 139 |
+
| **Cold Start** | Models download | Pre-cached |
|
| 140 |
+
|
| 141 |
+
## Recommended Path
|
| 142 |
+
|
| 143 |
+
```mermaid
|
| 144 |
+
graph TD
|
| 145 |
+
A[Start] --> B{Need GPU?}
|
| 146 |
+
B -->|No| C[Use Minimal + CPU]
|
| 147 |
+
B -->|Yes| D{Have Pro/Enterprise?}
|
| 148 |
+
D -->|No| E[Use Minimal + t4-small]
|
| 149 |
+
D -->|Yes| F{Need instant startup?}
|
| 150 |
+
F -->|No| E
|
| 151 |
+
F -->|Yes| G[Use Full + t4-medium]
|
| 152 |
+
|
| 153 |
+
C --> H[✅ Deploy in 5 min]
|
| 154 |
+
E --> I[✅ Deploy in 10 min]
|
| 155 |
+
G --> J[✅ Deploy in 30 min]
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
**My recommendation:** Start with **Minimal + CPU** to verify everything works, then upgrade to GPU if needed.
|
| 159 |
+
|
| 160 |
+
## Testing Checklist
|
| 161 |
+
|
| 162 |
+
After deployment, verify these endpoints:
|
| 163 |
+
|
| 164 |
+
```bash
|
| 165 |
+
# Replace YOUR_SPACE with your actual space name
|
| 166 |
+
SPACE_URL="https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE"
|
| 167 |
+
|
| 168 |
+
# 1. Health check
|
| 169 |
+
curl $SPACE_URL/health
|
| 170 |
+
# Expected: {"status": "ok"}
|
| 171 |
+
|
| 172 |
+
# 2. Readiness check
|
| 173 |
+
curl $SPACE_URL/health/ready
|
| 174 |
+
# Expected: {"status": "ready"}
|
| 175 |
+
|
| 176 |
+
# 3. Root endpoint
|
| 177 |
+
curl $SPACE_URL/
|
| 178 |
+
# Expected: {"message": "Medical AI Service", ...}
|
| 179 |
+
|
| 180 |
+
# 4. API docs
|
| 181 |
+
open $SPACE_URL/docs
|
| 182 |
+
# Should show FastAPI Swagger UI
|
| 183 |
+
```
|
| 184 |
+
|
| 185 |
+
## Troubleshooting
|
| 186 |
+
|
| 187 |
+
### "Still getting scheduling error"
|
| 188 |
+
- Check your HF account tier (Settings → Billing)
|
| 189 |
+
- Try removing `hardware:` section entirely (use free CPU)
|
| 190 |
+
- Check https://status.huggingface.co/ for platform issues
|
| 191 |
+
|
| 192 |
+
### "Build succeeds but app crashes"
|
| 193 |
+
- Check Space logs for Python errors
|
| 194 |
+
- Test Docker image locally first:
|
| 195 |
+
```bash
|
| 196 |
+
docker build -f Dockerfile.hf-spaces-minimal -t test .
|
| 197 |
+
docker run -p 7860:7860 -e HF_SPACES=true test
|
| 198 |
+
```
|
| 199 |
+
|
| 200 |
+
### "App starts but requests fail"
|
| 201 |
+
- Models are downloading on first request (wait 2-3 min)
|
| 202 |
+
- Check memory usage in Space settings
|
| 203 |
+
- Consider enabling PRELOAD_GGUF if using GPU
|
| 204 |
+
|
| 205 |
+
## Success Indicators
|
| 206 |
+
|
| 207 |
+
Your Space logs should show:
|
| 208 |
+
```
|
| 209 |
+
✅ Starting Medical AI Service on Hugging Face Spaces
|
| 210 |
+
✅ Detected Hugging Face Spaces environment
|
| 211 |
+
✅ Creating FastAPI application for HF Spaces...
|
| 212 |
+
✅ Application initialized successfully
|
| 213 |
+
✅ Uvicorn running on http://0.0.0.0:7860
|
| 214 |
+
```
|
| 215 |
+
|
| 216 |
+
## Need Help?
|
| 217 |
+
|
| 218 |
+
1. **Read the guides:**
|
| 219 |
+
- `HF_SPACES_QUICK_FIX.md` - Quick solutions
|
| 220 |
+
- `HF_SPACES_SCHEDULING_FIX.md` - Detailed troubleshooting
|
| 221 |
+
|
| 222 |
+
2. **Check logs:**
|
| 223 |
+
- Go to your Space → Settings → Logs
|
| 224 |
+
- Look for error messages
|
| 225 |
+
|
| 226 |
+
3. **Test locally:**
|
| 227 |
+
- Build and run Docker image on your machine
|
| 228 |
+
- Verify it works before pushing to HF
|
| 229 |
+
|
| 230 |
+
4. **Community support:**
|
| 231 |
+
- HF Discord: https://discord.gg/hugging-face
|
| 232 |
+
- HF Forum: https://discuss.huggingface.co/
|
| 233 |
+
|
| 234 |
+
## Summary
|
| 235 |
+
|
| 236 |
+
**What to do RIGHT NOW:**
|
| 237 |
+
1. Update `.huggingface.yaml` to use `Dockerfile.hf-spaces-minimal`
|
| 238 |
+
2. Remove the `hardware` section (or use `gpu: t4-small`)
|
| 239 |
+
3. Commit and push
|
| 240 |
+
4. Wait 5-10 minutes
|
| 241 |
+
5. Test your endpoints
|
| 242 |
+
|
| 243 |
+
**Expected result:** Your Space will deploy successfully and be accessible within 10 minutes! 🎉
|
| 244 |
+
|
| 245 |
+
---
|
| 246 |
+
|
| 247 |
+
Last updated: 2025-11-13
|
| 248 |
+
|
Dockerfile.hf-spaces
CHANGED
|
@@ -132,6 +132,6 @@ HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
|
|
| 132 |
ENTRYPOINT ["/entrypoint.sh"]
|
| 133 |
|
| 134 |
# Start the application
|
| 135 |
-
# Use
|
| 136 |
-
CMD ["uvicorn", "
|
| 137 |
|
|
|
|
| 132 |
ENTRYPOINT ["/entrypoint.sh"]
|
| 133 |
|
| 134 |
# Start the application
|
| 135 |
+
# Use the root app.py which is designed for HF Spaces
|
| 136 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1"]
|
| 137 |
|
Dockerfile.hf-spaces-minimal
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.11-slim
|
| 2 |
+
|
| 3 |
+
ENV PYTHONDONTWRITEBYTECODE=1 \
|
| 4 |
+
PYTHONUNBUFFERED=1 \
|
| 5 |
+
DEBIAN_FRONTEND=noninteractive
|
| 6 |
+
|
| 7 |
+
WORKDIR /app
|
| 8 |
+
|
| 9 |
+
# Install system dependencies (minimal set)
|
| 10 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 11 |
+
tesseract-ocr \
|
| 12 |
+
poppler-utils \
|
| 13 |
+
ffmpeg \
|
| 14 |
+
curl \
|
| 15 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 16 |
+
|
| 17 |
+
# Copy and install Python dependencies
|
| 18 |
+
COPY requirements.txt .
|
| 19 |
+
RUN pip install --no-cache-dir -r requirements.txt uvicorn[standard]
|
| 20 |
+
|
| 21 |
+
# Copy application code
|
| 22 |
+
COPY . .
|
| 23 |
+
|
| 24 |
+
# Set environment for HF Spaces with minimal resource usage
|
| 25 |
+
ENV PYTHONPATH=/app/services/ai-service/src:$PYTHONPATH \
|
| 26 |
+
HF_SPACES=true \
|
| 27 |
+
FAST_MODE=true \
|
| 28 |
+
PRELOAD_SMALL_MODELS=false \
|
| 29 |
+
PRELOAD_GGUF=false \
|
| 30 |
+
HF_HOME=/tmp/huggingface \
|
| 31 |
+
TORCH_HOME=/tmp/torch \
|
| 32 |
+
WHISPER_CACHE=/tmp/whisper \
|
| 33 |
+
MODEL_CACHE_DIR=/tmp/models \
|
| 34 |
+
TRANSFORMERS_CACHE=/tmp/huggingface/transformers \
|
| 35 |
+
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 \
|
| 36 |
+
TOKENIZERS_PARALLELISM=false \
|
| 37 |
+
OMP_NUM_THREADS=1 \
|
| 38 |
+
MKL_NUM_THREADS=1
|
| 39 |
+
|
| 40 |
+
# Create necessary directories
|
| 41 |
+
RUN mkdir -p /tmp/uploads /tmp/huggingface /tmp/models && \
|
| 42 |
+
chmod -R 777 /tmp
|
| 43 |
+
|
| 44 |
+
EXPOSE 7860
|
| 45 |
+
|
| 46 |
+
# Health check
|
| 47 |
+
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
|
| 48 |
+
CMD curl -f http://localhost:7860/health || exit 1
|
| 49 |
+
|
| 50 |
+
# Start application with single worker for minimal memory footprint
|
| 51 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1", "--timeout-keep-alive", "600"]
|
| 52 |
+
|
HF_SPACES_QUICK_FIX.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HF Spaces Scheduling Error - QUICK FIX
|
| 2 |
+
|
| 3 |
+
## The Error
|
| 4 |
+
```
|
| 5 |
+
Scheduling failure: unable to schedule
|
| 6 |
+
Container logs: Failed to retrieve error logs: SSE is not enabled
|
| 7 |
+
```
|
| 8 |
+
|
| 9 |
+
## Fastest Fix (5 minutes)
|
| 10 |
+
|
| 11 |
+
### Option 1: CPU-Only Mode (Most Reliable) ⭐
|
| 12 |
+
|
| 13 |
+
**Step 1:** Update `.huggingface.yaml`:
|
| 14 |
+
```yaml
|
| 15 |
+
runtime: docker
|
| 16 |
+
sdk: docker
|
| 17 |
+
python_version: "3.10"
|
| 18 |
+
|
| 19 |
+
build:
|
| 20 |
+
dockerfile: Dockerfile.hf-spaces-minimal # Use the minimal Dockerfile
|
| 21 |
+
cache: true
|
| 22 |
+
|
| 23 |
+
# NO hardware section = uses free CPU tier
|
| 24 |
+
|
| 25 |
+
env:
|
| 26 |
+
- HF_SPACES=true
|
| 27 |
+
- FAST_MODE=true
|
| 28 |
+
- PRELOAD_GGUF=false
|
| 29 |
+
- PRELOAD_SMALL_MODELS=false
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
**Step 2:** Commit and push:
|
| 33 |
+
```bash
|
| 34 |
+
git add .huggingface.yaml
|
| 35 |
+
git commit -m "Use CPU-only minimal config"
|
| 36 |
+
git push
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
**Result:** Deploys in 5-10 minutes ✅
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
### Option 2: T4 Small GPU (If GPU Needed)
|
| 44 |
+
|
| 45 |
+
**Step 1:** Update `.huggingface.yaml`:
|
| 46 |
+
```yaml
|
| 47 |
+
runtime: docker
|
| 48 |
+
sdk: docker
|
| 49 |
+
|
| 50 |
+
build:
|
| 51 |
+
dockerfile: Dockerfile.hf-spaces-minimal
|
| 52 |
+
cache: true
|
| 53 |
+
|
| 54 |
+
hardware:
|
| 55 |
+
gpu: t4-small # More available than t4-medium
|
| 56 |
+
|
| 57 |
+
env:
|
| 58 |
+
- HF_SPACES=true
|
| 59 |
+
- FAST_MODE=true
|
| 60 |
+
- PRELOAD_GGUF=false
|
| 61 |
+
- CUDA_VISIBLE_DEVICES=0
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
**Step 2:** Commit and push:
|
| 65 |
+
```bash
|
| 66 |
+
git add .huggingface.yaml
|
| 67 |
+
git commit -m "Use t4-small GPU"
|
| 68 |
+
git push
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
**Result:** Deploys in 10-15 minutes if GPU available ✅
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
### Option 3: Keep Current Setup, Try Later
|
| 76 |
+
|
| 77 |
+
Sometimes t4-medium GPUs are just temporarily unavailable.
|
| 78 |
+
|
| 79 |
+
**Step 1:** Check HF Spaces status:
|
| 80 |
+
- https://status.huggingface.co/
|
| 81 |
+
|
| 82 |
+
**Step 2:** Wait 30-60 minutes and try again
|
| 83 |
+
|
| 84 |
+
**Step 3:** Or request GPU access at:
|
| 85 |
+
- https://huggingface.co/settings/billing
|
| 86 |
+
|
| 87 |
+
---
|
| 88 |
+
|
| 89 |
+
## Already Made These Fixes
|
| 90 |
+
|
| 91 |
+
✅ Fixed `.huggingface.yaml` - removed conflicting entrypoint
|
| 92 |
+
✅ Fixed `Dockerfile.hf-spaces` - correct CMD
|
| 93 |
+
✅ Created `Dockerfile.hf-spaces-minimal` - lightweight option
|
| 94 |
+
|
| 95 |
+
## Test After Deployment
|
| 96 |
+
|
| 97 |
+
```bash
|
| 98 |
+
# Replace YOUR_SPACE with your actual space name
|
| 99 |
+
curl https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE/health
|
| 100 |
+
|
| 101 |
+
# Should return:
|
| 102 |
+
# {"status": "ok", "hf_spaces": true}
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
## Why This Happens
|
| 106 |
+
|
| 107 |
+
1. **t4-medium GPUs** are in high demand → often unavailable
|
| 108 |
+
2. **Hardware tier** might not be available in your account
|
| 109 |
+
3. **Container too large** → timeout during scheduling
|
| 110 |
+
|
| 111 |
+
## Success Indicators
|
| 112 |
+
|
| 113 |
+
Watch for these in your Space logs:
|
| 114 |
+
```
|
| 115 |
+
✅ "Starting Medical AI Service on Hugging Face Spaces"
|
| 116 |
+
✅ "FastAPI application started"
|
| 117 |
+
✅ "Application initialized successfully"
|
| 118 |
+
✅ "Uvicorn running on http://0.0.0.0:7860"
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
## Still Not Working?
|
| 122 |
+
|
| 123 |
+
1. **Check your HF account tier** - GPU access required for GPU hardware
|
| 124 |
+
2. **Try the minimal config** - Uses least resources
|
| 125 |
+
3. **Check HF Spaces status** - Platform issues?
|
| 126 |
+
4. **Review build logs** - Look for specific errors
|
| 127 |
+
|
| 128 |
+
## Support
|
| 129 |
+
|
| 130 |
+
- HF Spaces Discord: https://discord.gg/hugging-face
|
| 131 |
+
- HF Forum: https://discuss.huggingface.co/
|
| 132 |
+
- Check status: https://status.huggingface.co/
|
| 133 |
+
|
| 134 |
+
---
|
| 135 |
+
|
| 136 |
+
**TL;DR:** Change `Dockerfile.hf-spaces-minimal` in `.huggingface.yaml` and remove the `hardware` section. Push. Wait 5 minutes. ✅
|
| 137 |
+
|
HF_SPACES_SCHEDULING_FIX.md
ADDED
|
@@ -0,0 +1,331 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hugging Face Spaces - "Scheduling failure: unable to schedule" Fix
|
| 2 |
+
|
| 3 |
+
## Problem
|
| 4 |
+
|
| 5 |
+
When deploying to Hugging Face Spaces, you're encountering:
|
| 6 |
+
```
|
| 7 |
+
Scheduling failure: unable to schedule
|
| 8 |
+
Container logs: Failed to retrieve error logs: SSE is not enabled
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
## Root Causes & Solutions
|
| 12 |
+
|
| 13 |
+
### 1. Hardware Availability Issue (Most Common)
|
| 14 |
+
|
| 15 |
+
The `t4-medium` GPU might not be available in your region or tier.
|
| 16 |
+
|
| 17 |
+
**Solution A: Try Different Hardware Tiers**
|
| 18 |
+
|
| 19 |
+
Edit `.huggingface.yaml` and try these alternatives in order:
|
| 20 |
+
|
| 21 |
+
```yaml
|
| 22 |
+
# Option 1: T4 Small (often more available)
|
| 23 |
+
hardware:
|
| 24 |
+
gpu: t4-small # 8GB GPU RAM, 8GB System RAM
|
| 25 |
+
|
| 26 |
+
# Option 2: CPU Upgrade (no GPU, but faster CPU)
|
| 27 |
+
hardware:
|
| 28 |
+
cpu: upgrade # More CPU power, no GPU
|
| 29 |
+
|
| 30 |
+
# Option 3: Zero GPU (on-demand GPU)
|
| 31 |
+
hardware:
|
| 32 |
+
gpu: zero # GPU only when needed
|
| 33 |
+
|
| 34 |
+
# Option 4: Remove hardware section entirely (uses free tier)
|
| 35 |
+
# hardware:
|
| 36 |
+
# gpu: t4-medium
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
**Solution B: Request Hardware Access**
|
| 40 |
+
|
| 41 |
+
If you need GPU but it's not available:
|
| 42 |
+
1. Go to your HF account settings
|
| 43 |
+
2. Check your hardware tier/subscription
|
| 44 |
+
3. Request access to GPU hardware if needed
|
| 45 |
+
4. Upgrade to Pro/Enterprise for better GPU availability
|
| 46 |
+
|
| 47 |
+
### 2. Application Entry Point Mismatch
|
| 48 |
+
|
| 49 |
+
**Fixed:** The `.huggingface.yaml` was specifying an `app.entrypoint` that conflicts with the Dockerfile CMD.
|
| 50 |
+
|
| 51 |
+
**Changes Made:**
|
| 52 |
+
- ✅ Removed `app.entrypoint` from `.huggingface.yaml` (Docker CMD takes precedence)
|
| 53 |
+
- ✅ Updated Dockerfile CMD to use `app:app` (the HF Spaces-optimized entry point)
|
| 54 |
+
|
| 55 |
+
### 3. Container Startup Failure
|
| 56 |
+
|
| 57 |
+
The error "SSE is not enabled" suggests the container might be failing before the app starts.
|
| 58 |
+
|
| 59 |
+
**Verification Steps:**
|
| 60 |
+
|
| 61 |
+
1. **Test Locally First:**
|
| 62 |
+
```bash
|
| 63 |
+
# Build the HF Spaces Docker image locally
|
| 64 |
+
docker build -f Dockerfile.hf-spaces -t hntai-hf-test .
|
| 65 |
+
|
| 66 |
+
# Run it locally to verify it starts
|
| 67 |
+
docker run -p 7860:7860 \
|
| 68 |
+
-e HF_SPACES=true \
|
| 69 |
+
-e HF_HOME=/app/.cache/huggingface \
|
| 70 |
+
hntai-hf-test
|
| 71 |
+
|
| 72 |
+
# Test the health endpoint
|
| 73 |
+
curl http://localhost:7860/health
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
2. **Check Logs in HF Spaces:**
|
| 77 |
+
- Go to your Space settings
|
| 78 |
+
- Click on "Logs" tab
|
| 79 |
+
- Look for error messages during startup
|
| 80 |
+
- Common issues:
|
| 81 |
+
- Out of memory during model loading
|
| 82 |
+
- Missing dependencies
|
| 83 |
+
- Python import errors
|
| 84 |
+
|
| 85 |
+
### 4. Resource Requirements Too High
|
| 86 |
+
|
| 87 |
+
The current configuration tries to preload multiple large models (~4.2GB).
|
| 88 |
+
|
| 89 |
+
**Solution: Reduce Memory Footprint**
|
| 90 |
+
|
| 91 |
+
Edit `Dockerfile.hf-spaces` to disable model preloading:
|
| 92 |
+
|
| 93 |
+
```dockerfile
|
| 94 |
+
# Comment out the model preloading stage
|
| 95 |
+
# FROM builder AS model-cache
|
| 96 |
+
# ... (comment out the entire section)
|
| 97 |
+
|
| 98 |
+
# In the final stage, set PRELOAD_GGUF to false
|
| 99 |
+
ENV PRELOAD_GGUF=false \
|
| 100 |
+
PRELOAD_SMALL_MODELS=false \
|
| 101 |
+
FAST_MODE=true
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
Or edit `.huggingface.yaml`:
|
| 105 |
+
```yaml
|
| 106 |
+
env:
|
| 107 |
+
- PRELOAD_GGUF=false
|
| 108 |
+
- PRELOAD_SMALL_MODELS=false
|
| 109 |
+
- FAST_MODE=true
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## Complete Fixed Configuration
|
| 113 |
+
|
| 114 |
+
### `.huggingface.yaml` (Fixed)
|
| 115 |
+
```yaml
|
| 116 |
+
runtime: docker
|
| 117 |
+
sdk: docker
|
| 118 |
+
python_version: "3.10"
|
| 119 |
+
|
| 120 |
+
build:
|
| 121 |
+
dockerfile: Dockerfile.hf-spaces
|
| 122 |
+
cache: true
|
| 123 |
+
|
| 124 |
+
# Try these hardware options in order
|
| 125 |
+
hardware:
|
| 126 |
+
gpu: t4-small # Start with t4-small for better availability
|
| 127 |
+
|
| 128 |
+
env:
|
| 129 |
+
- SPACE_ID=$SPACE_ID
|
| 130 |
+
- HF_HOME=/app/.cache/huggingface
|
| 131 |
+
- TORCH_HOME=/app/.cache/torch
|
| 132 |
+
- MODEL_CACHE_DIR=/app/models
|
| 133 |
+
- PRELOAD_GGUF=false # Disable for faster startup
|
| 134 |
+
- PRELOAD_SMALL_MODELS=false # Disable for faster startup
|
| 135 |
+
- FAST_MODE=true # Enable fast mode
|
| 136 |
+
- HF_SPACES=true
|
| 137 |
+
- CUDA_VISIBLE_DEVICES=0
|
| 138 |
+
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
### Dockerfile.hf-spaces (Fixed)
|
| 142 |
+
```dockerfile
|
| 143 |
+
# Start the application
|
| 144 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1"]
|
| 145 |
+
```
|
| 146 |
+
|
| 147 |
+
## Deployment Steps
|
| 148 |
+
|
| 149 |
+
### Option 1: Quick Fix (Recommended First Try)
|
| 150 |
+
|
| 151 |
+
1. **Use CPU-only mode for faster deployment:**
|
| 152 |
+
```yaml
|
| 153 |
+
# .huggingface.yaml
|
| 154 |
+
# Comment out the hardware section
|
| 155 |
+
# hardware:
|
| 156 |
+
# gpu: t4-medium
|
| 157 |
+
|
| 158 |
+
env:
|
| 159 |
+
- FAST_MODE=true
|
| 160 |
+
- PRELOAD_GGUF=false
|
| 161 |
+
- CUDA_VISIBLE_DEVICES="" # Disable GPU
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
2. **Commit and push:**
|
| 165 |
+
```bash
|
| 166 |
+
git add .huggingface.yaml
|
| 167 |
+
git commit -m "Fix HF Spaces scheduling - use CPU mode"
|
| 168 |
+
git push
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
### Option 2: GPU with Minimal Models
|
| 172 |
+
|
| 173 |
+
1. **Reduce model preloading:**
|
| 174 |
+
```bash
|
| 175 |
+
# Edit preload_models.py to only load essential models
|
| 176 |
+
# Comment out large models (google/flan-t5-large, etc.)
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
2. **Use t4-small instead of t4-medium:**
|
| 180 |
+
```yaml
|
| 181 |
+
hardware:
|
| 182 |
+
gpu: t4-small
|
| 183 |
+
```
|
| 184 |
+
|
| 185 |
+
3. **Commit and push:**
|
| 186 |
+
```bash
|
| 187 |
+
git add .
|
| 188 |
+
git commit -m "Optimize for t4-small GPU"
|
| 189 |
+
git push
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
### Option 3: Full GPU with Pre-cached Models
|
| 193 |
+
|
| 194 |
+
1. **Ensure you have GPU access in your HF account**
|
| 195 |
+
2. **Wait for t4-medium availability** (can take hours/days)
|
| 196 |
+
3. **Monitor space status** in HF Spaces dashboard
|
| 197 |
+
|
| 198 |
+
## Troubleshooting Checklist
|
| 199 |
+
|
| 200 |
+
- [ ] Check HF account GPU tier/subscription
|
| 201 |
+
- [ ] Try t4-small instead of t4-medium
|
| 202 |
+
- [ ] Try CPU mode (remove hardware section)
|
| 203 |
+
- [ ] Disable model preloading (PRELOAD_GGUF=false)
|
| 204 |
+
- [ ] Test Docker image locally
|
| 205 |
+
- [ ] Check Space logs for errors
|
| 206 |
+
- [ ] Verify requirements.txt has all dependencies
|
| 207 |
+
- [ ] Ensure app.py is in the root directory
|
| 208 |
+
- [ ] Check that PYTHONPATH is set correctly
|
| 209 |
+
- [ ] Verify port 7860 is exposed
|
| 210 |
+
|
| 211 |
+
## Common Error Messages & Solutions
|
| 212 |
+
|
| 213 |
+
### "Scheduling failure: unable to schedule"
|
| 214 |
+
- **Cause**: Hardware tier unavailable
|
| 215 |
+
- **Fix**: Change to t4-small or CPU-only mode
|
| 216 |
+
|
| 217 |
+
### "Failed to retrieve error logs: SSE is not enabled"
|
| 218 |
+
- **Cause**: Container failed before app started
|
| 219 |
+
- **Fix**: Check startup logs, reduce memory usage
|
| 220 |
+
|
| 221 |
+
### "Container build timeout"
|
| 222 |
+
- **Cause**: Model downloading takes too long
|
| 223 |
+
- **Fix**: Reduce models in preload_models.py
|
| 224 |
+
|
| 225 |
+
### "CUDA out of memory"
|
| 226 |
+
- **Cause**: Models too large for GPU
|
| 227 |
+
- **Fix**: Use smaller models or CPU mode
|
| 228 |
+
|
| 229 |
+
## Verification After Fix
|
| 230 |
+
|
| 231 |
+
Once deployed, verify:
|
| 232 |
+
|
| 233 |
+
```bash
|
| 234 |
+
# Check health endpoint
|
| 235 |
+
curl https://YOUR_SPACE_NAME.hf.space/health
|
| 236 |
+
|
| 237 |
+
# Check if app is ready
|
| 238 |
+
curl https://YOUR_SPACE_NAME.hf.space/health/ready
|
| 239 |
+
|
| 240 |
+
# Test a simple endpoint
|
| 241 |
+
curl https://YOUR_SPACE_NAME.hf.space/
|
| 242 |
+
```
|
| 243 |
+
|
| 244 |
+
Expected response:
|
| 245 |
+
```json
|
| 246 |
+
{
|
| 247 |
+
"message": "Medical AI Service",
|
| 248 |
+
"status": "running",
|
| 249 |
+
"hf_spaces": true
|
| 250 |
+
}
|
| 251 |
+
```
|
| 252 |
+
|
| 253 |
+
## Quick Wins for Immediate Deployment
|
| 254 |
+
|
| 255 |
+
If you just want to get it running ASAP:
|
| 256 |
+
|
| 257 |
+
1. **Remove hardware requirements entirely (use free CPU tier):**
|
| 258 |
+
```yaml
|
| 259 |
+
# .huggingface.yaml
|
| 260 |
+
runtime: docker
|
| 261 |
+
sdk: docker
|
| 262 |
+
|
| 263 |
+
build:
|
| 264 |
+
dockerfile: Dockerfile.hf-spaces
|
| 265 |
+
|
| 266 |
+
env:
|
| 267 |
+
- HF_SPACES=true
|
| 268 |
+
- FAST_MODE=true
|
| 269 |
+
- PRELOAD_GGUF=false
|
| 270 |
+
- PRELOAD_SMALL_MODELS=false
|
| 271 |
+
```
|
| 272 |
+
|
| 273 |
+
2. **Create a simpler Dockerfile.hf-spaces-minimal:**
|
| 274 |
+
```dockerfile
|
| 275 |
+
FROM python:3.11-slim
|
| 276 |
+
|
| 277 |
+
WORKDIR /app
|
| 278 |
+
|
| 279 |
+
# Copy app files
|
| 280 |
+
COPY requirements.txt .
|
| 281 |
+
RUN pip install --no-cache-dir -r requirements.txt uvicorn[standard]
|
| 282 |
+
|
| 283 |
+
COPY . .
|
| 284 |
+
|
| 285 |
+
ENV PYTHONPATH=/app/services/ai-service/src:$PYTHONPATH \
|
| 286 |
+
HF_SPACES=true \
|
| 287 |
+
FAST_MODE=true \
|
| 288 |
+
PRELOAD_SMALL_MODELS=false
|
| 289 |
+
|
| 290 |
+
EXPOSE 7860
|
| 291 |
+
|
| 292 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
| 293 |
+
```
|
| 294 |
+
|
| 295 |
+
3. **Update .huggingface.yaml to use the minimal Dockerfile:**
|
| 296 |
+
```yaml
|
| 297 |
+
build:
|
| 298 |
+
dockerfile: Dockerfile.hf-spaces-minimal
|
| 299 |
+
```
|
| 300 |
+
|
| 301 |
+
## Support Resources
|
| 302 |
+
|
| 303 |
+
- **HF Spaces Docs**: https://huggingface.co/docs/hub/spaces
|
| 304 |
+
- **HF Spaces Community**: https://huggingface.co/spaces-discussions
|
| 305 |
+
- **Hardware Tiers**: https://huggingface.co/pricing#spaces
|
| 306 |
+
|
| 307 |
+
## Summary of Changes Made
|
| 308 |
+
|
| 309 |
+
✅ **Fixed `.huggingface.yaml`**
|
| 310 |
+
- Removed conflicting `app.entrypoint` configuration
|
| 311 |
+
- Added hardware alternatives in comments
|
| 312 |
+
|
| 313 |
+
✅ **Fixed `Dockerfile.hf-spaces`**
|
| 314 |
+
- Changed CMD to use `app:app` (HF Spaces entry point)
|
| 315 |
+
- Proper PYTHONPATH configuration
|
| 316 |
+
|
| 317 |
+
✅ **Root `app.py`** is already optimized for HF Spaces
|
| 318 |
+
- Automatic HF Spaces detection
|
| 319 |
+
- Lightweight initialization
|
| 320 |
+
- Proper error handling
|
| 321 |
+
|
| 322 |
+
## Next Steps
|
| 323 |
+
|
| 324 |
+
1. Choose one of the deployment options above
|
| 325 |
+
2. Make the changes to your repository
|
| 326 |
+
3. Commit and push to HF Spaces
|
| 327 |
+
4. Monitor the build logs
|
| 328 |
+
5. Test the endpoints once deployed
|
| 329 |
+
|
| 330 |
+
The most reliable quick fix is **Option 1** (CPU-only mode), which will deploy successfully within 5-10 minutes.
|
| 331 |
+
|
services/ai-service/DEPLOYMENT_FIX.md
ADDED
|
@@ -0,0 +1,177 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Deployment Fix for "Scheduling failure: unable to schedule" Error
|
| 2 |
+
|
| 3 |
+
## Problem Identified
|
| 4 |
+
|
| 5 |
+
The deployment was failing with a "Scheduling failure: unable to schedule" error because the **Dockerfile.prod** was configured to use **Gunicorn with WSGI**, but the application is built with **FastAPI which requires ASGI**.
|
| 6 |
+
|
| 7 |
+
### Root Cause
|
| 8 |
+
- **FastAPI** is an ASGI (Asynchronous Server Gateway Interface) framework
|
| 9 |
+
- **Gunicorn** was running in WSGI (Web Server Gateway Interface) mode
|
| 10 |
+
- This fundamental incompatibility caused the container to fail to start properly
|
| 11 |
+
- SSE (Server-Sent Events) requires ASGI support for proper streaming
|
| 12 |
+
|
| 13 |
+
## Fix Applied
|
| 14 |
+
|
| 15 |
+
### Changed: `Dockerfile.prod`
|
| 16 |
+
|
| 17 |
+
**Before:**
|
| 18 |
+
```dockerfile
|
| 19 |
+
RUN pip install --no-cache-dir -r /app/requirements.txt gunicorn
|
| 20 |
+
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:7860", "--timeout", "600", "wsgi:app"]
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
**After:**
|
| 24 |
+
```dockerfile
|
| 25 |
+
RUN pip install --no-cache-dir -r /app/requirements.txt uvicorn[standard]
|
| 26 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "600", "--workers", "4"]
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
### Why This Works
|
| 30 |
+
1. **uvicorn** is a proper ASGI server that supports FastAPI
|
| 31 |
+
2. Enables SSE (Server-Sent Events) for streaming responses
|
| 32 |
+
3. Supports async/await patterns used throughout the codebase
|
| 33 |
+
4. Provides better performance for async applications
|
| 34 |
+
|
| 35 |
+
## Additional Recommendations
|
| 36 |
+
|
| 37 |
+
### 1. Kubernetes Resource Allocation
|
| 38 |
+
|
| 39 |
+
Review your cluster's available resources. The deployment requires:
|
| 40 |
+
```yaml
|
| 41 |
+
resources:
|
| 42 |
+
requests:
|
| 43 |
+
cpu: "500m"
|
| 44 |
+
memory: "2Gi"
|
| 45 |
+
limits:
|
| 46 |
+
cpu: "2000m"
|
| 47 |
+
memory: "4Gi"
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
**Verification Steps:**
|
| 51 |
+
```bash
|
| 52 |
+
# Check available cluster resources
|
| 53 |
+
kubectl describe nodes
|
| 54 |
+
|
| 55 |
+
# Check if pods are pending
|
| 56 |
+
kubectl get pods -n medical-ai
|
| 57 |
+
|
| 58 |
+
# Check pod events for scheduling issues
|
| 59 |
+
kubectl describe pod <pod-name> -n medical-ai
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
### 2. Alternative ASGI Server Options
|
| 63 |
+
|
| 64 |
+
If you need more production-grade deployment with multiple workers:
|
| 65 |
+
|
| 66 |
+
#### Option A: Gunicorn with Uvicorn Workers (Recommended for Production)
|
| 67 |
+
```dockerfile
|
| 68 |
+
RUN pip install --no-cache-dir -r /app/requirements.txt gunicorn uvicorn[standard]
|
| 69 |
+
CMD ["gunicorn", "app:app", "--workers", "4", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:7860", "--timeout", "600"]
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
#### Option B: Pure Uvicorn (Current, Good for Medium Load)
|
| 73 |
+
```dockerfile
|
| 74 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "600", "--workers", "4"]
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
### 3. Health Check Configuration
|
| 78 |
+
|
| 79 |
+
Ensure your health endpoints are accessible:
|
| 80 |
+
- **Liveness Probe:** `/health/live`
|
| 81 |
+
- **Readiness Probe:** `/health/ready`
|
| 82 |
+
|
| 83 |
+
The delays in `k8s/deployment.yaml` are appropriate:
|
| 84 |
+
- `initialDelaySeconds: 20` for readiness
|
| 85 |
+
- `initialDelaySeconds: 30` for liveness
|
| 86 |
+
|
| 87 |
+
### 4. Environment Variables to Set
|
| 88 |
+
|
| 89 |
+
For optimal performance in Kubernetes:
|
| 90 |
+
```yaml
|
| 91 |
+
env:
|
| 92 |
+
- name: PRELOAD_SMALL_MODELS
|
| 93 |
+
value: "false" # Set to true if you want faster first-request
|
| 94 |
+
- name: FAST_MODE
|
| 95 |
+
value: "false"
|
| 96 |
+
- name: ENABLE_BATCHING
|
| 97 |
+
value: "true"
|
| 98 |
+
- name: INFERENCE_MAX_WORKERS
|
| 99 |
+
value: "4"
|
| 100 |
+
- name: HF_HOME
|
| 101 |
+
value: "/tmp/huggingface"
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
### 5. Rebuild and Redeploy
|
| 105 |
+
|
| 106 |
+
```bash
|
| 107 |
+
# Rebuild the Docker image
|
| 108 |
+
docker build -f services/ai-service/Dockerfile.prod -t your-registry/ai-service:latest .
|
| 109 |
+
|
| 110 |
+
# Push to registry
|
| 111 |
+
docker push your-registry/ai-service:latest
|
| 112 |
+
|
| 113 |
+
# Update Kubernetes deployment
|
| 114 |
+
kubectl rollout restart deployment/ai-service -n medical-ai
|
| 115 |
+
|
| 116 |
+
# Monitor rollout
|
| 117 |
+
kubectl rollout status deployment/ai-service -n medical-ai
|
| 118 |
+
|
| 119 |
+
# Check logs
|
| 120 |
+
kubectl logs -f deployment/ai-service -n medical-ai
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
## Verification Steps
|
| 124 |
+
|
| 125 |
+
After deploying the fix:
|
| 126 |
+
|
| 127 |
+
1. **Check Pod Status:**
|
| 128 |
+
```bash
|
| 129 |
+
kubectl get pods -n medical-ai -w
|
| 130 |
+
```
|
| 131 |
+
|
| 132 |
+
2. **Verify Container Logs:**
|
| 133 |
+
```bash
|
| 134 |
+
kubectl logs -f <pod-name> -n medical-ai
|
| 135 |
+
```
|
| 136 |
+
|
| 137 |
+
3. **Test Health Endpoints:**
|
| 138 |
+
```bash
|
| 139 |
+
kubectl port-forward svc/ai-service 7860:80 -n medical-ai
|
| 140 |
+
curl http://localhost:7860/health/ready
|
| 141 |
+
curl http://localhost:7860/health/live
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
4. **Test SSE Streaming:**
|
| 145 |
+
```bash
|
| 146 |
+
curl http://localhost:7860/api/v1/patient-summary/stream/<job-id>
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
## Expected Results
|
| 150 |
+
|
| 151 |
+
After applying this fix:
|
| 152 |
+
- ✅ Container should start successfully
|
| 153 |
+
- ✅ Pods should transition to "Running" state
|
| 154 |
+
- ✅ Health checks should pass
|
| 155 |
+
- ✅ SSE streaming should work properly
|
| 156 |
+
- ✅ No more "Scheduling failure" errors
|
| 157 |
+
|
| 158 |
+
## Troubleshooting
|
| 159 |
+
|
| 160 |
+
### If pods still don't schedule:
|
| 161 |
+
1. Check cluster resource availability
|
| 162 |
+
2. Verify node selectors and taints
|
| 163 |
+
3. Check if persistent volumes are available
|
| 164 |
+
4. Review network policies
|
| 165 |
+
|
| 166 |
+
### If container crashes on startup:
|
| 167 |
+
1. Check application logs: `kubectl logs <pod-name> -n medical-ai`
|
| 168 |
+
2. Verify environment variables are set correctly
|
| 169 |
+
3. Ensure DATABASE_URL and REDIS_URL are accessible (if configured)
|
| 170 |
+
4. Check that the requirements.txt includes all necessary dependencies
|
| 171 |
+
|
| 172 |
+
## Related Files
|
| 173 |
+
- `services/ai-service/Dockerfile.prod` - Fixed Docker configuration
|
| 174 |
+
- `services/ai-service/k8s/deployment.yaml` - Kubernetes deployment
|
| 175 |
+
- `services/ai-service/src/app.py` - FastAPI application entry point
|
| 176 |
+
- `services/ai-service/src/wsgi.py` - Legacy WSGI file (not needed anymore)
|
| 177 |
+
|
services/ai-service/Dockerfile.prod
CHANGED
|
@@ -15,10 +15,11 @@ RUN apt-get update \
|
|
| 15 |
COPY services/ai-service/src /app
|
| 16 |
COPY requirements.txt /app/requirements.txt
|
| 17 |
|
| 18 |
-
RUN pip install --no-cache-dir -r /app/requirements.txt
|
| 19 |
|
| 20 |
EXPOSE 7860
|
| 21 |
|
| 22 |
ENV PRELOAD_SMALL_MODELS=false
|
| 23 |
|
| 24 |
-
|
|
|
|
|
|
| 15 |
COPY services/ai-service/src /app
|
| 16 |
COPY requirements.txt /app/requirements.txt
|
| 17 |
|
| 18 |
+
RUN pip install --no-cache-dir -r /app/requirements.txt uvicorn[standard]
|
| 19 |
|
| 20 |
EXPOSE 7860
|
| 21 |
|
| 22 |
ENV PRELOAD_SMALL_MODELS=false
|
| 23 |
|
| 24 |
+
# Use uvicorn directly for FastAPI (ASGI) instead of gunicorn (WSGI)
|
| 25 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--timeout-keep-alive", "600", "--workers", "4"]
|
switch_hf_config.ps1
ADDED
|
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Quick configuration switcher for HF Spaces deployment
|
| 2 |
+
# Usage: .\switch_hf_config.ps1 [minimal|small-gpu|medium-gpu]
|
| 3 |
+
|
| 4 |
+
param(
|
| 5 |
+
[Parameter(Mandatory=$false)]
|
| 6 |
+
[ValidateSet('minimal', 'small-gpu', 'medium-gpu')]
|
| 7 |
+
[string]$Config
|
| 8 |
+
)
|
| 9 |
+
|
| 10 |
+
if (-not $Config) {
|
| 11 |
+
Write-Host "Usage: .\switch_hf_config.ps1 [minimal|small-gpu|medium-gpu]"
|
| 12 |
+
Write-Host ""
|
| 13 |
+
Write-Host "Options:"
|
| 14 |
+
Write-Host " minimal - CPU only, fastest deployment (recommended)"
|
| 15 |
+
Write-Host " small-gpu - T4 Small GPU, good balance"
|
| 16 |
+
Write-Host " medium-gpu - T4 Medium GPU, full preloading (Pro/Enterprise)"
|
| 17 |
+
Write-Host ""
|
| 18 |
+
exit 1
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
switch ($Config) {
|
| 22 |
+
'minimal' {
|
| 23 |
+
Write-Host "🔧 Switching to MINIMAL configuration (CPU-only)..." -ForegroundColor Cyan
|
| 24 |
+
|
| 25 |
+
$content = @"
|
| 26 |
+
runtime: docker
|
| 27 |
+
sdk: docker
|
| 28 |
+
python_version: "3.10"
|
| 29 |
+
|
| 30 |
+
build:
|
| 31 |
+
dockerfile: Dockerfile.hf-spaces-minimal
|
| 32 |
+
cache: true
|
| 33 |
+
|
| 34 |
+
env:
|
| 35 |
+
- HF_SPACES=true
|
| 36 |
+
- FAST_MODE=true
|
| 37 |
+
- PRELOAD_GGUF=false
|
| 38 |
+
- PRELOAD_SMALL_MODELS=false
|
| 39 |
+
"@
|
| 40 |
+
|
| 41 |
+
Set-Content -Path ".huggingface.yaml" -Value $content
|
| 42 |
+
Write-Host "✅ Configuration updated to CPU-only mode" -ForegroundColor Green
|
| 43 |
+
Write-Host "📝 This will deploy on the free tier (no GPU)" -ForegroundColor Yellow
|
| 44 |
+
Write-Host "⚡ Build time: ~5-10 minutes" -ForegroundColor Yellow
|
| 45 |
+
}
|
| 46 |
+
|
| 47 |
+
'small-gpu' {
|
| 48 |
+
Write-Host "🔧 Switching to SMALL GPU configuration (T4 Small)..." -ForegroundColor Cyan
|
| 49 |
+
|
| 50 |
+
$content = @"
|
| 51 |
+
runtime: docker
|
| 52 |
+
sdk: docker
|
| 53 |
+
python_version: "3.10"
|
| 54 |
+
|
| 55 |
+
build:
|
| 56 |
+
dockerfile: Dockerfile.hf-spaces-minimal
|
| 57 |
+
cache: true
|
| 58 |
+
|
| 59 |
+
hardware:
|
| 60 |
+
gpu: t4-small
|
| 61 |
+
|
| 62 |
+
env:
|
| 63 |
+
- HF_SPACES=true
|
| 64 |
+
- FAST_MODE=true
|
| 65 |
+
- PRELOAD_GGUF=false
|
| 66 |
+
- PRELOAD_SMALL_MODELS=false
|
| 67 |
+
- CUDA_VISIBLE_DEVICES=0
|
| 68 |
+
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
|
| 69 |
+
"@
|
| 70 |
+
|
| 71 |
+
Set-Content -Path ".huggingface.yaml" -Value $content
|
| 72 |
+
Write-Host "✅ Configuration updated to T4 Small GPU" -ForegroundColor Green
|
| 73 |
+
Write-Host "📝 Requires GPU access in your HF account" -ForegroundColor Yellow
|
| 74 |
+
Write-Host "⚡ Build time: ~10-15 minutes" -ForegroundColor Yellow
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
'medium-gpu' {
|
| 78 |
+
Write-Host "🔧 Switching to MEDIUM GPU configuration (T4 Medium + Preloading)..." -ForegroundColor Cyan
|
| 79 |
+
|
| 80 |
+
$content = @"
|
| 81 |
+
runtime: docker
|
| 82 |
+
sdk: docker
|
| 83 |
+
python_version: "3.10"
|
| 84 |
+
|
| 85 |
+
build:
|
| 86 |
+
dockerfile: Dockerfile.hf-spaces
|
| 87 |
+
cache: true
|
| 88 |
+
|
| 89 |
+
hardware:
|
| 90 |
+
gpu: t4-medium
|
| 91 |
+
|
| 92 |
+
env:
|
| 93 |
+
- SPACE_ID=`$SPACE_ID
|
| 94 |
+
- HF_HOME=/app/.cache/huggingface
|
| 95 |
+
- TORCH_HOME=/app/.cache/torch
|
| 96 |
+
- MODEL_CACHE_DIR=/app/models
|
| 97 |
+
- PRELOAD_GGUF=true
|
| 98 |
+
- HF_SPACES=true
|
| 99 |
+
- CUDA_VISIBLE_DEVICES=0
|
| 100 |
+
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
|
| 101 |
+
"@
|
| 102 |
+
|
| 103 |
+
Set-Content -Path ".huggingface.yaml" -Value $content
|
| 104 |
+
Write-Host "✅ Configuration updated to T4 Medium GPU with preloading" -ForegroundColor Green
|
| 105 |
+
Write-Host "📝 Requires Pro/Enterprise tier" -ForegroundColor Yellow
|
| 106 |
+
Write-Host "⚡ Build time: ~20-30 minutes (first time), instant startup" -ForegroundColor Yellow
|
| 107 |
+
}
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
Write-Host ""
|
| 111 |
+
Write-Host "📋 Next steps:" -ForegroundColor Cyan
|
| 112 |
+
Write-Host " 1. Review the changes: git diff .huggingface.yaml"
|
| 113 |
+
Write-Host " 2. Commit: git commit -am 'Switch to $Config configuration'"
|
| 114 |
+
Write-Host " 3. Push: git push"
|
| 115 |
+
Write-Host " 4. Monitor your Space build logs"
|
| 116 |
+
Write-Host ""
|
| 117 |
+
Write-Host "🔍 Check status at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE" -ForegroundColor Yellow
|
| 118 |
+
|
switch_hf_config.sh
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Quick configuration switcher for HF Spaces deployment
|
| 3 |
+
# Usage: ./switch_hf_config.sh [minimal|small-gpu|medium-gpu]
|
| 4 |
+
|
| 5 |
+
set -e
|
| 6 |
+
|
| 7 |
+
CONFIG=$1
|
| 8 |
+
|
| 9 |
+
if [ -z "$CONFIG" ]; then
|
| 10 |
+
echo "Usage: $0 [minimal|small-gpu|medium-gpu]"
|
| 11 |
+
echo ""
|
| 12 |
+
echo "Options:"
|
| 13 |
+
echo " minimal - CPU only, fastest deployment (recommended)"
|
| 14 |
+
echo " small-gpu - T4 Small GPU, good balance"
|
| 15 |
+
echo " medium-gpu - T4 Medium GPU, full preloading (Pro/Enterprise)"
|
| 16 |
+
echo ""
|
| 17 |
+
exit 1
|
| 18 |
+
fi
|
| 19 |
+
|
| 20 |
+
case $CONFIG in
|
| 21 |
+
minimal)
|
| 22 |
+
echo "🔧 Switching to MINIMAL configuration (CPU-only)..."
|
| 23 |
+
cat > .huggingface.yaml << 'EOF'
|
| 24 |
+
runtime: docker
|
| 25 |
+
sdk: docker
|
| 26 |
+
python_version: "3.10"
|
| 27 |
+
|
| 28 |
+
build:
|
| 29 |
+
dockerfile: Dockerfile.hf-spaces-minimal
|
| 30 |
+
cache: true
|
| 31 |
+
|
| 32 |
+
env:
|
| 33 |
+
- HF_SPACES=true
|
| 34 |
+
- FAST_MODE=true
|
| 35 |
+
- PRELOAD_GGUF=false
|
| 36 |
+
- PRELOAD_SMALL_MODELS=false
|
| 37 |
+
EOF
|
| 38 |
+
echo "✅ Configuration updated to CPU-only mode"
|
| 39 |
+
echo "📝 This will deploy on the free tier (no GPU)"
|
| 40 |
+
echo "⚡ Build time: ~5-10 minutes"
|
| 41 |
+
;;
|
| 42 |
+
|
| 43 |
+
small-gpu)
|
| 44 |
+
echo "🔧 Switching to SMALL GPU configuration (T4 Small)..."
|
| 45 |
+
cat > .huggingface.yaml << 'EOF'
|
| 46 |
+
runtime: docker
|
| 47 |
+
sdk: docker
|
| 48 |
+
python_version: "3.10"
|
| 49 |
+
|
| 50 |
+
build:
|
| 51 |
+
dockerfile: Dockerfile.hf-spaces-minimal
|
| 52 |
+
cache: true
|
| 53 |
+
|
| 54 |
+
hardware:
|
| 55 |
+
gpu: t4-small
|
| 56 |
+
|
| 57 |
+
env:
|
| 58 |
+
- HF_SPACES=true
|
| 59 |
+
- FAST_MODE=true
|
| 60 |
+
- PRELOAD_GGUF=false
|
| 61 |
+
- PRELOAD_SMALL_MODELS=false
|
| 62 |
+
- CUDA_VISIBLE_DEVICES=0
|
| 63 |
+
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
|
| 64 |
+
EOF
|
| 65 |
+
echo "✅ Configuration updated to T4 Small GPU"
|
| 66 |
+
echo "📝 Requires GPU access in your HF account"
|
| 67 |
+
echo "⚡ Build time: ~10-15 minutes"
|
| 68 |
+
;;
|
| 69 |
+
|
| 70 |
+
medium-gpu)
|
| 71 |
+
echo "🔧 Switching to MEDIUM GPU configuration (T4 Medium + Preloading)..."
|
| 72 |
+
cat > .huggingface.yaml << 'EOF'
|
| 73 |
+
runtime: docker
|
| 74 |
+
sdk: docker
|
| 75 |
+
python_version: "3.10"
|
| 76 |
+
|
| 77 |
+
build:
|
| 78 |
+
dockerfile: Dockerfile.hf-spaces
|
| 79 |
+
cache: true
|
| 80 |
+
|
| 81 |
+
hardware:
|
| 82 |
+
gpu: t4-medium
|
| 83 |
+
|
| 84 |
+
env:
|
| 85 |
+
- SPACE_ID=$SPACE_ID
|
| 86 |
+
- HF_HOME=/app/.cache/huggingface
|
| 87 |
+
- TORCH_HOME=/app/.cache/torch
|
| 88 |
+
- MODEL_CACHE_DIR=/app/models
|
| 89 |
+
- PRELOAD_GGUF=true
|
| 90 |
+
- HF_SPACES=true
|
| 91 |
+
- CUDA_VISIBLE_DEVICES=0
|
| 92 |
+
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
|
| 93 |
+
EOF
|
| 94 |
+
echo "✅ Configuration updated to T4 Medium GPU with preloading"
|
| 95 |
+
echo "📝 Requires Pro/Enterprise tier"
|
| 96 |
+
echo "⚡ Build time: ~20-30 minutes (first time), instant startup"
|
| 97 |
+
;;
|
| 98 |
+
|
| 99 |
+
*)
|
| 100 |
+
echo "❌ Invalid option: $CONFIG"
|
| 101 |
+
echo "Use: minimal, small-gpu, or medium-gpu"
|
| 102 |
+
exit 1
|
| 103 |
+
;;
|
| 104 |
+
esac
|
| 105 |
+
|
| 106 |
+
echo ""
|
| 107 |
+
echo "📋 Next steps:"
|
| 108 |
+
echo " 1. Review the changes: git diff .huggingface.yaml"
|
| 109 |
+
echo " 2. Commit: git commit -am 'Switch to $CONFIG configuration'"
|
| 110 |
+
echo " 3. Push: git push"
|
| 111 |
+
echo " 4. Monitor your Space build logs"
|
| 112 |
+
echo ""
|
| 113 |
+
echo "🔍 Check status at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE"
|
| 114 |
+
|