CancerAtHomeV2 / MODEL_CARD.md
Mentors4EDU's picture
Update MODEL_CARD.md
087e68e verified
metadata
license: mit
tags:
  - cancer-genomics
  - bioinformatics
  - graph-database
  - neo4j
  - distributed-computing
  - boinc
  - healthcare
  - genomics
  - fastq
  - blast
  - variant-calling
  - gdc-portal
  - tcga
library_name: cancer-at-home-v2
pipeline_tag: other

Cancer@Home v2

Version License Python Neo4j

๐Ÿงฌ Overview

Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization into a unified, easy-to-use system.

Inspired by Cancer@Home v1 and Andrew Kamal's Neo4j Dashboard, this platform makes cancer genomics research accessible, distributed, and visual.

๐ŸŽฏ Key Features

  • ๐ŸŒ Interactive Web Dashboard - Modern UI with real-time visualizations
  • ๐Ÿ” Neo4j Graph Database - Model complex gene-mutation-patient relationships
  • โšก BOINC Integration - Distributed computing for intensive analyses
  • ๐Ÿ“Š GraphQL API - Flexible data querying
  • ๐Ÿงช Bioinformatics Pipeline - FASTQ processing, BLAST alignment, variant calling
  • ๐Ÿ“š GDC Portal Integration - Access TCGA/TARGET cancer datasets
  • ๐Ÿš€ Quick Setup - Running in under 5 minutes

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     Web Dashboard (D3.js + Chart.js)        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚     FastAPI Backend (REST + GraphQL)        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Neo4j โ”‚BOINC โ”‚ GDC  โ”‚FASTQ โ”‚ BLAST/Variant  โ”‚
โ”‚Graph โ”‚Clientโ”‚ API  โ”‚  QC  โ”‚    Calling     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.8+
  • Docker Desktop
  • 8GB RAM (16GB recommended)

Quick Start

Windows:

git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
.\setup.ps1
python run.py

Linux/Mac:

git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
chmod +x setup.sh
./setup.sh
python run.py

Then open: http://localhost:5000

๐Ÿš€ Usage

Web Dashboard

Access the interactive dashboard at http://localhost:5000 with:

  • Dashboard Tab: Overview statistics and mutation charts
  • Neo4j Visualization: Interactive graph of cancer relationships
  • BOINC Tasks: Submit and monitor distributed computing tasks
  • GDC Data: Browse and download cancer datasets
  • Pipeline Tools: Run FASTQ QC, BLAST, and variant calling

GraphQL API

Query cancer data at http://localhost:5000/graphql

Example: Get mutations in TP53 gene

query {
  mutations(gene: "TP53") {
    mutation_id
    chromosome
    position
    consequence
  }
}

Example: Get patient statistics

query {
  cancerStatistics(cancer_type_id: "BRCA") {
    total_patients
    total_mutations
    avg_mutations_per_patient
  }
}

REST API

Database Summary:

curl http://localhost:5000/api/neo4j/summary

Submit BOINC Task:

curl -X POST http://localhost:5000/api/boinc/submit \
  -H "Content-Type: application/json" \
  -d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}'

Python API

FASTQ Processing:

from backend.pipeline import FASTQProcessor

processor = FASTQProcessor()
stats = processor.calculate_statistics("input.fastq")
filtered = processor.quality_filter("input.fastq")

Variant Calling:

from backend.pipeline import VariantCaller, VariantAnalyzer

caller = VariantCaller()
vcf_file = caller.call_variants("alignment.bam", "reference.fa")
variants = caller.filter_variants(vcf_file)

analyzer = VariantAnalyzer()
cancer_variants = analyzer.identify_cancer_variants(variants)
tmb = analyzer.calculate_mutation_burden(variants)

Neo4j Queries:

from backend.neo4j import DatabaseManager

db = DatabaseManager()
query = """
MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation)
RETURN m.position, m.consequence
"""
results = db.execute_query(query)
db.close()

๐Ÿ“Š Data Model

Neo4j Graph Schema

Nodes:

  • Gene: Genes with mutations (TP53, BRCA1, KRAS, etc.)
  • Mutation: Genetic variants with position and consequence
  • Patient: Individual cases with demographics
  • CancerType: Cancer classifications (BRCA, LUAD, COAD, GBM)

Relationships:

  • Gene โ† AFFECTS โ† Mutation
  • Patient โ†’ HAS_MUTATION โ†’ Mutation
  • Patient โ†’ DIAGNOSED_WITH โ†’ CancerType

Sample Data Included

  • 7 Genes: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR
  • 5 Mutations: Cancer-associated variants
  • 5 Patients: Representative TCGA cases
  • 4 Cancer Types: BRCA, LUAD, COAD, GBM

๐Ÿ”ง Technology Stack

  • Backend: FastAPI, Python 3.8+
  • Database: Neo4j 5.13 (Graph Database)
  • API: GraphQL (Strawberry), REST
  • Frontend: HTML5, CSS3, JavaScript, D3.js, Chart.js
  • Bioinformatics: Biopython, BLAST+
  • Data Source: GDC Portal API (TCGA/TARGET)
  • Infrastructure: Docker, Docker Compose
  • Distributed Computing: BOINC Framework

๐Ÿ“š Documentation

๐ŸŽ“ Use Cases

  1. Cancer Research: Analyze genomics data with distributed computing
  2. Education: Learn cancer genetics and bioinformatics
  3. Data Visualization: Explore gene-mutation-patient relationships
  4. Pipeline Development: Test bioinformatics workflows
  5. Graph Analytics: Query complex biological networks

๐Ÿ”ฌ Supported Cancer Projects

  • TCGA-BRCA: Breast Cancer (1,098 cases)
  • TCGA-LUAD: Lung Adenocarcinoma (585 cases)
  • TCGA-COAD: Colon Adenocarcinoma (461 cases)
  • TCGA-GBM: Glioblastoma (617 cases)
  • TARGET-AML: Acute Myeloid Leukemia (238 cases)

๐Ÿ“ˆ Bioinformatics Pipeline

FASTQ Processing

  • Quality control and filtering
  • Adapter trimming
  • Statistics calculation
  • QC report generation

BLAST Alignment

  • BLASTN for nucleotide sequences
  • BLASTP for protein sequences
  • Hit filtering by identity/e-value
  • Homology detection

Variant Calling

  • VCF generation from alignments
  • Quality filtering
  • Cancer variant identification
  • Tumor mutation burden (TMB) calculation

๐ŸŒ Access Points

๐Ÿ› ๏ธ Configuration

Edit config.yml to customize:

neo4j:
  uri: "bolt://localhost:7687"
  password: "cancer123"

gdc:
  download_dir: "./data/gdc"
  projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"]

pipeline:
  fastq:
    quality_threshold: 20
    min_length: 50
  blast:
    evalue: 0.001
    num_threads: 4

๐Ÿค Contributing

Contributions are welcome! This project is open source under the MIT License.

Development Setup

python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt
pytest test_cancer_at_home.py

๐Ÿ“„ License

MIT License - See LICENSE file

Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal

๐Ÿ™ Acknowledgments

Inspiration

Data Sources

Technologies

  • Neo4j Graph Database
  • BOINC Distributed Computing Project
  • Biopython Community
  • FastAPI Framework

๐Ÿ‘ฅ Authors

  • OpenPeer AI - Core development and architecture
  • Riemann Computing Inc. - Distributed computing integration
  • Bleunomics - Bioinformatics pipeline and genomics expertise
  • Andrew Magdy Kamal - Graph database design and visualization

๐Ÿ“ž Support

  • Documentation: See project documentation files
  • Issues: Check logs in logs/cancer_at_home.log
  • Configuration: Review config.yml
  • Health Check: http://localhost:5000/api/health

๐Ÿ”ฎ Roadmap

Planned Features

  • Machine learning for mutation prediction
  • Multi-omics data integration (RNA-seq, proteomics)
  • Survival analysis and clinical outcomes
  • Advanced graph algorithms (PageRank, community detection)
  • Cloud deployment support (AWS, Azure, GCP)
  • Mobile-responsive design
  • User authentication and authorization

๐Ÿ“Š Statistics

  • Lines of Code: ~5,000+
  • Modules: 9 Python modules
  • API Endpoints: 15+ REST + GraphQL
  • Documentation: 2,500+ lines
  • Setup Time: < 5 minutes
  • Sample Data: 7 genes, 5 mutations, 5 patients

๐ŸŽฏ Citation

If you use Cancer@Home v2 in your research, please cite:

@software{cancer_at_home_v2,
  title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform},
  author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal},
  year = {2025},
  url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2},
  license = {MIT}
}

๐Ÿท๏ธ Tags

cancer-genomics bioinformatics neo4j graph-database distributed-computing boinc fastq blast variant-calling gdc-portal tcga target graphql fastapi python docker healthcare precision-medicine computational-biology


Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal

For cancer research, by researchers, accessible to all.