|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- cancer-genomics |
|
|
- bioinformatics |
|
|
- graph-database |
|
|
- neo4j |
|
|
- distributed-computing |
|
|
- boinc |
|
|
- healthcare |
|
|
- genomics |
|
|
- fastq |
|
|
- blast |
|
|
- variant-calling |
|
|
- gdc-portal |
|
|
- tcga |
|
|
library_name: cancer-at-home-v2 |
|
|
pipeline_tag: other |
|
|
--- |
|
|
|
|
|
# Cancer@Home v2 |
|
|
|
|
|
<div align="center"> |
|
|
<img src="https://img.shields.io/badge/version-2.0.0-blue.svg" alt="Version"> |
|
|
<img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License"> |
|
|
<img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python"> |
|
|
<img src="https://img.shields.io/badge/neo4j-5.13-brightgreen.svg" alt="Neo4j"> |
|
|
</div> |
|
|
|
|
|
## ๐งฌ Overview |
|
|
|
|
|
Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system. |
|
|
|
|
|
Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual. |
|
|
|
|
|
## ๐ฏ Key Features |
|
|
|
|
|
- ๐ **Interactive Web Dashboard** - Modern UI with real-time visualizations |
|
|
- ๐ **Neo4j Graph Database** - Model complex gene-mutation-patient relationships |
|
|
- โก **BOINC Integration** - Distributed computing for intensive analyses |
|
|
- ๐ **GraphQL API** - Flexible data querying |
|
|
- ๐งช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling |
|
|
- ๐ **GDC Portal Integration** - Access TCGA/TARGET cancer datasets |
|
|
- ๐ **Quick Setup** - Running in under 5 minutes |
|
|
|
|
|
## ๐๏ธ Architecture |
|
|
|
|
|
``` |
|
|
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
|
|
โ Web Dashboard (D3.js + Chart.js) โ |
|
|
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค |
|
|
โ FastAPI Backend (REST + GraphQL) โ |
|
|
โโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโโโโโโโโโโโค |
|
|
โNeo4j โBOINC โ GDC โFASTQ โ BLAST/Variant โ |
|
|
โGraph โClientโ API โ QC โ Calling โ |
|
|
โโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโโโโโโโโโโโ |
|
|
``` |
|
|
|
|
|
## ๐ฆ Installation |
|
|
|
|
|
### Prerequisites |
|
|
- Python 3.8+ |
|
|
- Docker Desktop |
|
|
- 8GB RAM (16GB recommended) |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
**Windows:** |
|
|
```powershell |
|
|
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2 |
|
|
cd CancerAtHomeV2 |
|
|
.\setup.ps1 |
|
|
python run.py |
|
|
``` |
|
|
|
|
|
**Linux/Mac:** |
|
|
```bash |
|
|
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2 |
|
|
cd CancerAtHomeV2 |
|
|
chmod +x setup.sh |
|
|
./setup.sh |
|
|
python run.py |
|
|
``` |
|
|
|
|
|
Then open: **http://localhost:5000** |
|
|
|
|
|
## ๐ Usage |
|
|
|
|
|
### Web Dashboard |
|
|
Access the interactive dashboard at http://localhost:5000 with: |
|
|
- **Dashboard Tab**: Overview statistics and mutation charts |
|
|
- **Neo4j Visualization**: Interactive graph of cancer relationships |
|
|
- **BOINC Tasks**: Submit and monitor distributed computing tasks |
|
|
- **GDC Data**: Browse and download cancer datasets |
|
|
- **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling |
|
|
|
|
|
### GraphQL API |
|
|
|
|
|
Query cancer data at http://localhost:5000/graphql |
|
|
|
|
|
**Example: Get mutations in TP53 gene** |
|
|
```graphql |
|
|
query { |
|
|
mutations(gene: "TP53") { |
|
|
mutation_id |
|
|
chromosome |
|
|
position |
|
|
consequence |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
**Example: Get patient statistics** |
|
|
```graphql |
|
|
query { |
|
|
cancerStatistics(cancer_type_id: "BRCA") { |
|
|
total_patients |
|
|
total_mutations |
|
|
avg_mutations_per_patient |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
### REST API |
|
|
|
|
|
**Database Summary:** |
|
|
```bash |
|
|
curl http://localhost:5000/api/neo4j/summary |
|
|
``` |
|
|
|
|
|
**Submit BOINC Task:** |
|
|
```bash |
|
|
curl -X POST http://localhost:5000/api/boinc/submit \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}' |
|
|
``` |
|
|
|
|
|
### Python API |
|
|
|
|
|
**FASTQ Processing:** |
|
|
```python |
|
|
from backend.pipeline import FASTQProcessor |
|
|
|
|
|
processor = FASTQProcessor() |
|
|
stats = processor.calculate_statistics("input.fastq") |
|
|
filtered = processor.quality_filter("input.fastq") |
|
|
``` |
|
|
|
|
|
**Variant Calling:** |
|
|
```python |
|
|
from backend.pipeline import VariantCaller, VariantAnalyzer |
|
|
|
|
|
caller = VariantCaller() |
|
|
vcf_file = caller.call_variants("alignment.bam", "reference.fa") |
|
|
variants = caller.filter_variants(vcf_file) |
|
|
|
|
|
analyzer = VariantAnalyzer() |
|
|
cancer_variants = analyzer.identify_cancer_variants(variants) |
|
|
tmb = analyzer.calculate_mutation_burden(variants) |
|
|
``` |
|
|
|
|
|
**Neo4j Queries:** |
|
|
```python |
|
|
from backend.neo4j import DatabaseManager |
|
|
|
|
|
db = DatabaseManager() |
|
|
query = """ |
|
|
MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation) |
|
|
RETURN m.position, m.consequence |
|
|
""" |
|
|
results = db.execute_query(query) |
|
|
db.close() |
|
|
``` |
|
|
|
|
|
## ๐ Data Model |
|
|
|
|
|
### Neo4j Graph Schema |
|
|
|
|
|
**Nodes:** |
|
|
- **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.) |
|
|
- **Mutation**: Genetic variants with position and consequence |
|
|
- **Patient**: Individual cases with demographics |
|
|
- **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM) |
|
|
|
|
|
**Relationships:** |
|
|
- `Gene โ AFFECTS โ Mutation` |
|
|
- `Patient โ HAS_MUTATION โ Mutation` |
|
|
- `Patient โ DIAGNOSED_WITH โ CancerType` |
|
|
|
|
|
### Sample Data Included |
|
|
|
|
|
- **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR |
|
|
- **5 Mutations**: Cancer-associated variants |
|
|
- **5 Patients**: Representative TCGA cases |
|
|
- **4 Cancer Types**: BRCA, LUAD, COAD, GBM |
|
|
|
|
|
## ๐ง Technology Stack |
|
|
|
|
|
- **Backend**: FastAPI, Python 3.8+ |
|
|
- **Database**: Neo4j 5.13 (Graph Database) |
|
|
- **API**: GraphQL (Strawberry), REST |
|
|
- **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js |
|
|
- **Bioinformatics**: Biopython, BLAST+ |
|
|
- **Data Source**: GDC Portal API (TCGA/TARGET) |
|
|
- **Infrastructure**: Docker, Docker Compose |
|
|
- **Distributed Computing**: BOINC Framework |
|
|
|
|
|
## ๐ Documentation |
|
|
|
|
|
- [README.md](README.md) - Complete project overview |
|
|
- [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide |
|
|
- [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation |
|
|
- [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples |
|
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture |
|
|
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview |
|
|
|
|
|
## ๐ Use Cases |
|
|
|
|
|
1. **Cancer Research**: Analyze genomics data with distributed computing |
|
|
2. **Education**: Learn cancer genetics and bioinformatics |
|
|
3. **Data Visualization**: Explore gene-mutation-patient relationships |
|
|
4. **Pipeline Development**: Test bioinformatics workflows |
|
|
5. **Graph Analytics**: Query complex biological networks |
|
|
|
|
|
## ๐ฌ Supported Cancer Projects |
|
|
|
|
|
- **TCGA-BRCA**: Breast Cancer (1,098 cases) |
|
|
- **TCGA-LUAD**: Lung Adenocarcinoma (585 cases) |
|
|
- **TCGA-COAD**: Colon Adenocarcinoma (461 cases) |
|
|
- **TCGA-GBM**: Glioblastoma (617 cases) |
|
|
- **TARGET-AML**: Acute Myeloid Leukemia (238 cases) |
|
|
|
|
|
## ๐ Bioinformatics Pipeline |
|
|
|
|
|
### FASTQ Processing |
|
|
- Quality control and filtering |
|
|
- Adapter trimming |
|
|
- Statistics calculation |
|
|
- QC report generation |
|
|
|
|
|
### BLAST Alignment |
|
|
- BLASTN for nucleotide sequences |
|
|
- BLASTP for protein sequences |
|
|
- Hit filtering by identity/e-value |
|
|
- Homology detection |
|
|
|
|
|
### Variant Calling |
|
|
- VCF generation from alignments |
|
|
- Quality filtering |
|
|
- Cancer variant identification |
|
|
- Tumor mutation burden (TMB) calculation |
|
|
|
|
|
## ๐ Access Points |
|
|
|
|
|
- **Application**: http://localhost:5000 |
|
|
- **API Docs**: http://localhost:5000/docs (Swagger UI) |
|
|
- **GraphQL**: http://localhost:5000/graphql |
|
|
- **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123) |
|
|
|
|
|
## ๐ ๏ธ Configuration |
|
|
|
|
|
Edit `config.yml` to customize: |
|
|
|
|
|
```yaml |
|
|
neo4j: |
|
|
uri: "bolt://localhost:7687" |
|
|
password: "cancer123" |
|
|
|
|
|
gdc: |
|
|
download_dir: "./data/gdc" |
|
|
projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"] |
|
|
|
|
|
pipeline: |
|
|
fastq: |
|
|
quality_threshold: 20 |
|
|
min_length: 50 |
|
|
blast: |
|
|
evalue: 0.001 |
|
|
num_threads: 4 |
|
|
``` |
|
|
|
|
|
## ๐ค Contributing |
|
|
|
|
|
Contributions are welcome! This project is open source under the MIT License. |
|
|
|
|
|
### Development Setup |
|
|
```bash |
|
|
python -m venv venv |
|
|
source venv/bin/activate # or venv\Scripts\activate on Windows |
|
|
pip install -r requirements.txt |
|
|
pytest test_cancer_at_home.py |
|
|
``` |
|
|
|
|
|
## ๐ License |
|
|
|
|
|
MIT License - See [LICENSE](LICENSE) file |
|
|
|
|
|
Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal |
|
|
|
|
|
## ๐ Acknowledgments |
|
|
|
|
|
### Inspiration |
|
|
- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge |
|
|
- [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) |
|
|
|
|
|
### Data Sources |
|
|
- [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/) |
|
|
- The Cancer Genome Atlas (TCGA) Program |
|
|
- Therapeutically Applicable Research to Generate Effective Treatments (TARGET) |
|
|
|
|
|
### Technologies |
|
|
- Neo4j Graph Database |
|
|
- BOINC Distributed Computing Project |
|
|
- Biopython Community |
|
|
- FastAPI Framework |
|
|
|
|
|
## ๐ฅ Authors |
|
|
|
|
|
- **OpenPeer AI** - Core development and architecture |
|
|
- **Riemann Computing Inc.** - Distributed computing integration |
|
|
- **Bleunomics** - Bioinformatics pipeline and genomics expertise |
|
|
- **Andrew Magdy Kamal** - Graph database design and visualization |
|
|
|
|
|
## ๐ Support |
|
|
|
|
|
- **Documentation**: See project documentation files |
|
|
- **Issues**: Check logs in `logs/cancer_at_home.log` |
|
|
- **Configuration**: Review `config.yml` |
|
|
- **Health Check**: http://localhost:5000/api/health |
|
|
|
|
|
## ๐ฎ Roadmap |
|
|
|
|
|
### Planned Features |
|
|
- Machine learning for mutation prediction |
|
|
- Multi-omics data integration (RNA-seq, proteomics) |
|
|
- Survival analysis and clinical outcomes |
|
|
- Advanced graph algorithms (PageRank, community detection) |
|
|
- Cloud deployment support (AWS, Azure, GCP) |
|
|
- Mobile-responsive design |
|
|
- User authentication and authorization |
|
|
|
|
|
## ๐ Statistics |
|
|
|
|
|
- **Lines of Code**: ~5,000+ |
|
|
- **Modules**: 9 Python modules |
|
|
- **API Endpoints**: 15+ REST + GraphQL |
|
|
- **Documentation**: 2,500+ lines |
|
|
- **Setup Time**: < 5 minutes |
|
|
- **Sample Data**: 7 genes, 5 mutations, 5 patients |
|
|
|
|
|
## ๐ฏ Citation |
|
|
|
|
|
If you use Cancer@Home v2 in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@software{cancer_at_home_v2, |
|
|
title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform}, |
|
|
author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2}, |
|
|
license = {MIT} |
|
|
} |
|
|
``` |
|
|
|
|
|
## ๐ท๏ธ Tags |
|
|
|
|
|
`cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology` |
|
|
|
|
|
--- |
|
|
|
|
|
**Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal** |
|
|
|
|
|
**For cancer research, by researchers, accessible to all.** |
|
|
|