license: mit
tags:
- cancer-genomics
- bioinformatics
- graph-database
- neo4j
- distributed-computing
- boinc
- healthcare
- genomics
- fastq
- blast
- variant-calling
- gdc-portal
- tcga
library_name: cancer-at-home-v2
pipeline_tag: other
Cancer@Home v2
๐งฌ Overview
Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization into a unified, easy-to-use system.
Inspired by Cancer@Home v1 and Andrew Kamal's Neo4j Dashboard, this platform makes cancer genomics research accessible, distributed, and visual.
๐ฏ Key Features
- ๐ Interactive Web Dashboard - Modern UI with real-time visualizations
- ๐ Neo4j Graph Database - Model complex gene-mutation-patient relationships
- โก BOINC Integration - Distributed computing for intensive analyses
- ๐ GraphQL API - Flexible data querying
- ๐งช Bioinformatics Pipeline - FASTQ processing, BLAST alignment, variant calling
- ๐ GDC Portal Integration - Access TCGA/TARGET cancer datasets
- ๐ Quick Setup - Running in under 5 minutes
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Web Dashboard (D3.js + Chart.js) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ FastAPI Backend (REST + GraphQL) โ
โโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโโโโโโโโโโโค
โNeo4j โBOINC โ GDC โFASTQ โ BLAST/Variant โ
โGraph โClientโ API โ QC โ Calling โ
โโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโโโโโโโโโโโ
๐ฆ Installation
Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM (16GB recommended)
Quick Start
Windows:
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
.\setup.ps1
python run.py
Linux/Mac:
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
chmod +x setup.sh
./setup.sh
python run.py
Then open: http://localhost:5000
๐ Usage
Web Dashboard
Access the interactive dashboard at http://localhost:5000 with:
- Dashboard Tab: Overview statistics and mutation charts
- Neo4j Visualization: Interactive graph of cancer relationships
- BOINC Tasks: Submit and monitor distributed computing tasks
- GDC Data: Browse and download cancer datasets
- Pipeline Tools: Run FASTQ QC, BLAST, and variant calling
GraphQL API
Query cancer data at http://localhost:5000/graphql
Example: Get mutations in TP53 gene
query {
mutations(gene: "TP53") {
mutation_id
chromosome
position
consequence
}
}
Example: Get patient statistics
query {
cancerStatistics(cancer_type_id: "BRCA") {
total_patients
total_mutations
avg_mutations_per_patient
}
}
REST API
Database Summary:
curl http://localhost:5000/api/neo4j/summary
Submit BOINC Task:
curl -X POST http://localhost:5000/api/boinc/submit \
-H "Content-Type: application/json" \
-d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}'
Python API
FASTQ Processing:
from backend.pipeline import FASTQProcessor
processor = FASTQProcessor()
stats = processor.calculate_statistics("input.fastq")
filtered = processor.quality_filter("input.fastq")
Variant Calling:
from backend.pipeline import VariantCaller, VariantAnalyzer
caller = VariantCaller()
vcf_file = caller.call_variants("alignment.bam", "reference.fa")
variants = caller.filter_variants(vcf_file)
analyzer = VariantAnalyzer()
cancer_variants = analyzer.identify_cancer_variants(variants)
tmb = analyzer.calculate_mutation_burden(variants)
Neo4j Queries:
from backend.neo4j import DatabaseManager
db = DatabaseManager()
query = """
MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation)
RETURN m.position, m.consequence
"""
results = db.execute_query(query)
db.close()
๐ Data Model
Neo4j Graph Schema
Nodes:
- Gene: Genes with mutations (TP53, BRCA1, KRAS, etc.)
- Mutation: Genetic variants with position and consequence
- Patient: Individual cases with demographics
- CancerType: Cancer classifications (BRCA, LUAD, COAD, GBM)
Relationships:
Gene โ AFFECTS โ MutationPatient โ HAS_MUTATION โ MutationPatient โ DIAGNOSED_WITH โ CancerType
Sample Data Included
- 7 Genes: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR
- 5 Mutations: Cancer-associated variants
- 5 Patients: Representative TCGA cases
- 4 Cancer Types: BRCA, LUAD, COAD, GBM
๐ง Technology Stack
- Backend: FastAPI, Python 3.8+
- Database: Neo4j 5.13 (Graph Database)
- API: GraphQL (Strawberry), REST
- Frontend: HTML5, CSS3, JavaScript, D3.js, Chart.js
- Bioinformatics: Biopython, BLAST+
- Data Source: GDC Portal API (TCGA/TARGET)
- Infrastructure: Docker, Docker Compose
- Distributed Computing: BOINC Framework
๐ Documentation
- README.md - Complete project overview
- QUICKSTART.md - 5-minute setup guide
- USER_GUIDE.md - Detailed usage documentation
- GRAPHQL_EXAMPLES.md - Query examples
- ARCHITECTURE.md - System architecture
- PROJECT_SUMMARY.md - Feature overview
๐ Use Cases
- Cancer Research: Analyze genomics data with distributed computing
- Education: Learn cancer genetics and bioinformatics
- Data Visualization: Explore gene-mutation-patient relationships
- Pipeline Development: Test bioinformatics workflows
- Graph Analytics: Query complex biological networks
๐ฌ Supported Cancer Projects
- TCGA-BRCA: Breast Cancer (1,098 cases)
- TCGA-LUAD: Lung Adenocarcinoma (585 cases)
- TCGA-COAD: Colon Adenocarcinoma (461 cases)
- TCGA-GBM: Glioblastoma (617 cases)
- TARGET-AML: Acute Myeloid Leukemia (238 cases)
๐ Bioinformatics Pipeline
FASTQ Processing
- Quality control and filtering
- Adapter trimming
- Statistics calculation
- QC report generation
BLAST Alignment
- BLASTN for nucleotide sequences
- BLASTP for protein sequences
- Hit filtering by identity/e-value
- Homology detection
Variant Calling
- VCF generation from alignments
- Quality filtering
- Cancer variant identification
- Tumor mutation burden (TMB) calculation
๐ Access Points
- Application: http://localhost:5000
- API Docs: http://localhost:5000/docs (Swagger UI)
- GraphQL: http://localhost:5000/graphql
- Neo4j Browser: http://localhost:7474 (neo4j/cancer123)
๐ ๏ธ Configuration
Edit config.yml to customize:
neo4j:
uri: "bolt://localhost:7687"
password: "cancer123"
gdc:
download_dir: "./data/gdc"
projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"]
pipeline:
fastq:
quality_threshold: 20
min_length: 50
blast:
evalue: 0.001
num_threads: 4
๐ค Contributing
Contributions are welcome! This project is open source under the MIT License.
Development Setup
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
pytest test_cancer_at_home.py
๐ License
MIT License - See LICENSE file
Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal
๐ Acknowledgments
Inspiration
- Cancer@Home v1 - HeroX DCx Challenge
- Andrew Kamal's Neo4j Cancer Visualization
Data Sources
- Genomic Data Commons (GDC) Portal
- The Cancer Genome Atlas (TCGA) Program
- Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
Technologies
- Neo4j Graph Database
- BOINC Distributed Computing Project
- Biopython Community
- FastAPI Framework
๐ฅ Authors
- OpenPeer AI - Core development and architecture
- Riemann Computing Inc. - Distributed computing integration
- Bleunomics - Bioinformatics pipeline and genomics expertise
- Andrew Magdy Kamal - Graph database design and visualization
๐ Support
- Documentation: See project documentation files
- Issues: Check logs in
logs/cancer_at_home.log - Configuration: Review
config.yml - Health Check: http://localhost:5000/api/health
๐ฎ Roadmap
Planned Features
- Machine learning for mutation prediction
- Multi-omics data integration (RNA-seq, proteomics)
- Survival analysis and clinical outcomes
- Advanced graph algorithms (PageRank, community detection)
- Cloud deployment support (AWS, Azure, GCP)
- Mobile-responsive design
- User authentication and authorization
๐ Statistics
- Lines of Code: ~5,000+
- Modules: 9 Python modules
- API Endpoints: 15+ REST + GraphQL
- Documentation: 2,500+ lines
- Setup Time: < 5 minutes
- Sample Data: 7 genes, 5 mutations, 5 patients
๐ฏ Citation
If you use Cancer@Home v2 in your research, please cite:
@software{cancer_at_home_v2,
title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform},
author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal},
year = {2025},
url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2},
license = {MIT}
}
๐ท๏ธ Tags
cancer-genomics bioinformatics neo4j graph-database distributed-computing boinc fastq blast variant-calling gdc-portal tcga target graphql fastapi python docker healthcare precision-medicine computational-biology
Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal
For cancer research, by researchers, accessible to all.