CancerAtHomeV2 / MODEL_CARD.md
Mentors4EDU's picture
Update MODEL_CARD.md
087e68e verified
---
license: mit
tags:
- cancer-genomics
- bioinformatics
- graph-database
- neo4j
- distributed-computing
- boinc
- healthcare
- genomics
- fastq
- blast
- variant-calling
- gdc-portal
- tcga
library_name: cancer-at-home-v2
pipeline_tag: other
---
# Cancer@Home v2
<div align="center">
<img src="https://img.shields.io/badge/version-2.0.0-blue.svg" alt="Version">
<img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License">
<img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python">
<img src="https://img.shields.io/badge/neo4j-5.13-brightgreen.svg" alt="Neo4j">
</div>
## ๐Ÿงฌ Overview
Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system.
Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual.
## ๐ŸŽฏ Key Features
- ๐ŸŒ **Interactive Web Dashboard** - Modern UI with real-time visualizations
- ๐Ÿ” **Neo4j Graph Database** - Model complex gene-mutation-patient relationships
- โšก **BOINC Integration** - Distributed computing for intensive analyses
- ๐Ÿ“Š **GraphQL API** - Flexible data querying
- ๐Ÿงช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling
- ๐Ÿ“š **GDC Portal Integration** - Access TCGA/TARGET cancer datasets
- ๐Ÿš€ **Quick Setup** - Running in under 5 minutes
## ๐Ÿ—๏ธ Architecture
```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Web Dashboard (D3.js + Chart.js) โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ FastAPI Backend (REST + GraphQL) โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Neo4j โ”‚BOINC โ”‚ GDC โ”‚FASTQ โ”‚ BLAST/Variant โ”‚
โ”‚Graph โ”‚Clientโ”‚ API โ”‚ QC โ”‚ Calling โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
## ๐Ÿ“ฆ Installation
### Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM (16GB recommended)
### Quick Start
**Windows:**
```powershell
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
.\setup.ps1
python run.py
```
**Linux/Mac:**
```bash
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
chmod +x setup.sh
./setup.sh
python run.py
```
Then open: **http://localhost:5000**
## ๐Ÿš€ Usage
### Web Dashboard
Access the interactive dashboard at http://localhost:5000 with:
- **Dashboard Tab**: Overview statistics and mutation charts
- **Neo4j Visualization**: Interactive graph of cancer relationships
- **BOINC Tasks**: Submit and monitor distributed computing tasks
- **GDC Data**: Browse and download cancer datasets
- **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling
### GraphQL API
Query cancer data at http://localhost:5000/graphql
**Example: Get mutations in TP53 gene**
```graphql
query {
mutations(gene: "TP53") {
mutation_id
chromosome
position
consequence
}
}
```
**Example: Get patient statistics**
```graphql
query {
cancerStatistics(cancer_type_id: "BRCA") {
total_patients
total_mutations
avg_mutations_per_patient
}
}
```
### REST API
**Database Summary:**
```bash
curl http://localhost:5000/api/neo4j/summary
```
**Submit BOINC Task:**
```bash
curl -X POST http://localhost:5000/api/boinc/submit \
-H "Content-Type: application/json" \
-d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}'
```
### Python API
**FASTQ Processing:**
```python
from backend.pipeline import FASTQProcessor
processor = FASTQProcessor()
stats = processor.calculate_statistics("input.fastq")
filtered = processor.quality_filter("input.fastq")
```
**Variant Calling:**
```python
from backend.pipeline import VariantCaller, VariantAnalyzer
caller = VariantCaller()
vcf_file = caller.call_variants("alignment.bam", "reference.fa")
variants = caller.filter_variants(vcf_file)
analyzer = VariantAnalyzer()
cancer_variants = analyzer.identify_cancer_variants(variants)
tmb = analyzer.calculate_mutation_burden(variants)
```
**Neo4j Queries:**
```python
from backend.neo4j import DatabaseManager
db = DatabaseManager()
query = """
MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation)
RETURN m.position, m.consequence
"""
results = db.execute_query(query)
db.close()
```
## ๐Ÿ“Š Data Model
### Neo4j Graph Schema
**Nodes:**
- **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.)
- **Mutation**: Genetic variants with position and consequence
- **Patient**: Individual cases with demographics
- **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM)
**Relationships:**
- `Gene โ† AFFECTS โ† Mutation`
- `Patient โ†’ HAS_MUTATION โ†’ Mutation`
- `Patient โ†’ DIAGNOSED_WITH โ†’ CancerType`
### Sample Data Included
- **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR
- **5 Mutations**: Cancer-associated variants
- **5 Patients**: Representative TCGA cases
- **4 Cancer Types**: BRCA, LUAD, COAD, GBM
## ๐Ÿ”ง Technology Stack
- **Backend**: FastAPI, Python 3.8+
- **Database**: Neo4j 5.13 (Graph Database)
- **API**: GraphQL (Strawberry), REST
- **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js
- **Bioinformatics**: Biopython, BLAST+
- **Data Source**: GDC Portal API (TCGA/TARGET)
- **Infrastructure**: Docker, Docker Compose
- **Distributed Computing**: BOINC Framework
## ๐Ÿ“š Documentation
- [README.md](README.md) - Complete project overview
- [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide
- [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation
- [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview
## ๐ŸŽ“ Use Cases
1. **Cancer Research**: Analyze genomics data with distributed computing
2. **Education**: Learn cancer genetics and bioinformatics
3. **Data Visualization**: Explore gene-mutation-patient relationships
4. **Pipeline Development**: Test bioinformatics workflows
5. **Graph Analytics**: Query complex biological networks
## ๐Ÿ”ฌ Supported Cancer Projects
- **TCGA-BRCA**: Breast Cancer (1,098 cases)
- **TCGA-LUAD**: Lung Adenocarcinoma (585 cases)
- **TCGA-COAD**: Colon Adenocarcinoma (461 cases)
- **TCGA-GBM**: Glioblastoma (617 cases)
- **TARGET-AML**: Acute Myeloid Leukemia (238 cases)
## ๐Ÿ“ˆ Bioinformatics Pipeline
### FASTQ Processing
- Quality control and filtering
- Adapter trimming
- Statistics calculation
- QC report generation
### BLAST Alignment
- BLASTN for nucleotide sequences
- BLASTP for protein sequences
- Hit filtering by identity/e-value
- Homology detection
### Variant Calling
- VCF generation from alignments
- Quality filtering
- Cancer variant identification
- Tumor mutation burden (TMB) calculation
## ๐ŸŒ Access Points
- **Application**: http://localhost:5000
- **API Docs**: http://localhost:5000/docs (Swagger UI)
- **GraphQL**: http://localhost:5000/graphql
- **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123)
## ๐Ÿ› ๏ธ Configuration
Edit `config.yml` to customize:
```yaml
neo4j:
uri: "bolt://localhost:7687"
password: "cancer123"
gdc:
download_dir: "./data/gdc"
projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"]
pipeline:
fastq:
quality_threshold: 20
min_length: 50
blast:
evalue: 0.001
num_threads: 4
```
## ๐Ÿค Contributing
Contributions are welcome! This project is open source under the MIT License.
### Development Setup
```bash
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
pytest test_cancer_at_home.py
```
## ๐Ÿ“„ License
MIT License - See [LICENSE](LICENSE) file
Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal
## ๐Ÿ™ Acknowledgments
### Inspiration
- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge
- [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4)
### Data Sources
- [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/)
- The Cancer Genome Atlas (TCGA) Program
- Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
### Technologies
- Neo4j Graph Database
- BOINC Distributed Computing Project
- Biopython Community
- FastAPI Framework
## ๐Ÿ‘ฅ Authors
- **OpenPeer AI** - Core development and architecture
- **Riemann Computing Inc.** - Distributed computing integration
- **Bleunomics** - Bioinformatics pipeline and genomics expertise
- **Andrew Magdy Kamal** - Graph database design and visualization
## ๐Ÿ“ž Support
- **Documentation**: See project documentation files
- **Issues**: Check logs in `logs/cancer_at_home.log`
- **Configuration**: Review `config.yml`
- **Health Check**: http://localhost:5000/api/health
## ๐Ÿ”ฎ Roadmap
### Planned Features
- Machine learning for mutation prediction
- Multi-omics data integration (RNA-seq, proteomics)
- Survival analysis and clinical outcomes
- Advanced graph algorithms (PageRank, community detection)
- Cloud deployment support (AWS, Azure, GCP)
- Mobile-responsive design
- User authentication and authorization
## ๐Ÿ“Š Statistics
- **Lines of Code**: ~5,000+
- **Modules**: 9 Python modules
- **API Endpoints**: 15+ REST + GraphQL
- **Documentation**: 2,500+ lines
- **Setup Time**: < 5 minutes
- **Sample Data**: 7 genes, 5 mutations, 5 patients
## ๐ŸŽฏ Citation
If you use Cancer@Home v2 in your research, please cite:
```bibtex
@software{cancer_at_home_v2,
title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform},
author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal},
year = {2025},
url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2},
license = {MIT}
}
```
## ๐Ÿท๏ธ Tags
`cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology`
---
**Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal**
**For cancer research, by researchers, accessible to all.**