Mentors4EDU commited on
Commit
087e68e
ยท
verified ยท
1 Parent(s): 949080e

Update MODEL_CARD.md

Browse files
Files changed (1) hide show
  1. MODEL_CARD.md +373 -373
MODEL_CARD.md CHANGED
@@ -1,373 +1,373 @@
1
- ---
2
- license: mit
3
- tags:
4
- - cancer-genomics
5
- - bioinformatics
6
- - graph-database
7
- - neo4j
8
- - distributed-computing
9
- - boinc
10
- - healthcare
11
- - genomics
12
- - fastq
13
- - blast
14
- - variant-calling
15
- - gdc-portal
16
- - tcga
17
- library_name: fastapi
18
- pipeline_tag: other
19
- ---
20
-
21
- # Cancer@Home v2
22
-
23
- <div align="center">
24
- <img src="https://img.shields.io/badge/version-2.0.0-blue.svg" alt="Version">
25
- <img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License">
26
- <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python">
27
- <img src="https://img.shields.io/badge/neo4j-5.13-brightgreen.svg" alt="Neo4j">
28
- </div>
29
-
30
- ## ๐Ÿงฌ Overview
31
-
32
- Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system.
33
-
34
- Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual.
35
-
36
- ## ๐ŸŽฏ Key Features
37
-
38
- - ๐ŸŒ **Interactive Web Dashboard** - Modern UI with real-time visualizations
39
- - ๐Ÿ” **Neo4j Graph Database** - Model complex gene-mutation-patient relationships
40
- - โšก **BOINC Integration** - Distributed computing for intensive analyses
41
- - ๐Ÿ“Š **GraphQL API** - Flexible data querying
42
- - ๐Ÿงช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling
43
- - ๐Ÿ“š **GDC Portal Integration** - Access TCGA/TARGET cancer datasets
44
- - ๐Ÿš€ **Quick Setup** - Running in under 5 minutes
45
-
46
- ## ๐Ÿ—๏ธ Architecture
47
-
48
- ```
49
- โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
50
- โ”‚ Web Dashboard (D3.js + Chart.js) โ”‚
51
- โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
52
- โ”‚ FastAPI Backend (REST + GraphQL) โ”‚
53
- โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
54
- โ”‚Neo4j โ”‚BOINC โ”‚ GDC โ”‚FASTQ โ”‚ BLAST/Variant โ”‚
55
- โ”‚Graph โ”‚Clientโ”‚ API โ”‚ QC โ”‚ Calling โ”‚
56
- โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
57
- ```
58
-
59
- ## ๐Ÿ“ฆ Installation
60
-
61
- ### Prerequisites
62
- - Python 3.8+
63
- - Docker Desktop
64
- - 8GB RAM (16GB recommended)
65
-
66
- ### Quick Start
67
-
68
- **Windows:**
69
- ```powershell
70
- git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
71
- cd CancerAtHomeV2
72
- .\setup.ps1
73
- python run.py
74
- ```
75
-
76
- **Linux/Mac:**
77
- ```bash
78
- git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
79
- cd CancerAtHomeV2
80
- chmod +x setup.sh
81
- ./setup.sh
82
- python run.py
83
- ```
84
-
85
- Then open: **http://localhost:5000**
86
-
87
- ## ๐Ÿš€ Usage
88
-
89
- ### Web Dashboard
90
- Access the interactive dashboard at http://localhost:5000 with:
91
- - **Dashboard Tab**: Overview statistics and mutation charts
92
- - **Neo4j Visualization**: Interactive graph of cancer relationships
93
- - **BOINC Tasks**: Submit and monitor distributed computing tasks
94
- - **GDC Data**: Browse and download cancer datasets
95
- - **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling
96
-
97
- ### GraphQL API
98
-
99
- Query cancer data at http://localhost:5000/graphql
100
-
101
- **Example: Get mutations in TP53 gene**
102
- ```graphql
103
- query {
104
- mutations(gene: "TP53") {
105
- mutation_id
106
- chromosome
107
- position
108
- consequence
109
- }
110
- }
111
- ```
112
-
113
- **Example: Get patient statistics**
114
- ```graphql
115
- query {
116
- cancerStatistics(cancer_type_id: "BRCA") {
117
- total_patients
118
- total_mutations
119
- avg_mutations_per_patient
120
- }
121
- }
122
- ```
123
-
124
- ### REST API
125
-
126
- **Database Summary:**
127
- ```bash
128
- curl http://localhost:5000/api/neo4j/summary
129
- ```
130
-
131
- **Submit BOINC Task:**
132
- ```bash
133
- curl -X POST http://localhost:5000/api/boinc/submit \
134
- -H "Content-Type: application/json" \
135
- -d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}'
136
- ```
137
-
138
- ### Python API
139
-
140
- **FASTQ Processing:**
141
- ```python
142
- from backend.pipeline import FASTQProcessor
143
-
144
- processor = FASTQProcessor()
145
- stats = processor.calculate_statistics("input.fastq")
146
- filtered = processor.quality_filter("input.fastq")
147
- ```
148
-
149
- **Variant Calling:**
150
- ```python
151
- from backend.pipeline import VariantCaller, VariantAnalyzer
152
-
153
- caller = VariantCaller()
154
- vcf_file = caller.call_variants("alignment.bam", "reference.fa")
155
- variants = caller.filter_variants(vcf_file)
156
-
157
- analyzer = VariantAnalyzer()
158
- cancer_variants = analyzer.identify_cancer_variants(variants)
159
- tmb = analyzer.calculate_mutation_burden(variants)
160
- ```
161
-
162
- **Neo4j Queries:**
163
- ```python
164
- from backend.neo4j import DatabaseManager
165
-
166
- db = DatabaseManager()
167
- query = """
168
- MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation)
169
- RETURN m.position, m.consequence
170
- """
171
- results = db.execute_query(query)
172
- db.close()
173
- ```
174
-
175
- ## ๐Ÿ“Š Data Model
176
-
177
- ### Neo4j Graph Schema
178
-
179
- **Nodes:**
180
- - **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.)
181
- - **Mutation**: Genetic variants with position and consequence
182
- - **Patient**: Individual cases with demographics
183
- - **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM)
184
-
185
- **Relationships:**
186
- - `Gene โ† AFFECTS โ† Mutation`
187
- - `Patient โ†’ HAS_MUTATION โ†’ Mutation`
188
- - `Patient โ†’ DIAGNOSED_WITH โ†’ CancerType`
189
-
190
- ### Sample Data Included
191
-
192
- - **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR
193
- - **5 Mutations**: Cancer-associated variants
194
- - **5 Patients**: Representative TCGA cases
195
- - **4 Cancer Types**: BRCA, LUAD, COAD, GBM
196
-
197
- ## ๐Ÿ”ง Technology Stack
198
-
199
- - **Backend**: FastAPI, Python 3.8+
200
- - **Database**: Neo4j 5.13 (Graph Database)
201
- - **API**: GraphQL (Strawberry), REST
202
- - **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js
203
- - **Bioinformatics**: Biopython, BLAST+
204
- - **Data Source**: GDC Portal API (TCGA/TARGET)
205
- - **Infrastructure**: Docker, Docker Compose
206
- - **Distributed Computing**: BOINC Framework
207
-
208
- ## ๐Ÿ“š Documentation
209
-
210
- - [README.md](README.md) - Complete project overview
211
- - [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide
212
- - [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation
213
- - [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples
214
- - [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
215
- - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview
216
-
217
- ## ๐ŸŽ“ Use Cases
218
-
219
- 1. **Cancer Research**: Analyze genomics data with distributed computing
220
- 2. **Education**: Learn cancer genetics and bioinformatics
221
- 3. **Data Visualization**: Explore gene-mutation-patient relationships
222
- 4. **Pipeline Development**: Test bioinformatics workflows
223
- 5. **Graph Analytics**: Query complex biological networks
224
-
225
- ## ๐Ÿ”ฌ Supported Cancer Projects
226
-
227
- - **TCGA-BRCA**: Breast Cancer (1,098 cases)
228
- - **TCGA-LUAD**: Lung Adenocarcinoma (585 cases)
229
- - **TCGA-COAD**: Colon Adenocarcinoma (461 cases)
230
- - **TCGA-GBM**: Glioblastoma (617 cases)
231
- - **TARGET-AML**: Acute Myeloid Leukemia (238 cases)
232
-
233
- ## ๐Ÿ“ˆ Bioinformatics Pipeline
234
-
235
- ### FASTQ Processing
236
- - Quality control and filtering
237
- - Adapter trimming
238
- - Statistics calculation
239
- - QC report generation
240
-
241
- ### BLAST Alignment
242
- - BLASTN for nucleotide sequences
243
- - BLASTP for protein sequences
244
- - Hit filtering by identity/e-value
245
- - Homology detection
246
-
247
- ### Variant Calling
248
- - VCF generation from alignments
249
- - Quality filtering
250
- - Cancer variant identification
251
- - Tumor mutation burden (TMB) calculation
252
-
253
- ## ๐ŸŒ Access Points
254
-
255
- - **Application**: http://localhost:5000
256
- - **API Docs**: http://localhost:5000/docs (Swagger UI)
257
- - **GraphQL**: http://localhost:5000/graphql
258
- - **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123)
259
-
260
- ## ๐Ÿ› ๏ธ Configuration
261
-
262
- Edit `config.yml` to customize:
263
-
264
- ```yaml
265
- neo4j:
266
- uri: "bolt://localhost:7687"
267
- password: "cancer123"
268
-
269
- gdc:
270
- download_dir: "./data/gdc"
271
- projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"]
272
-
273
- pipeline:
274
- fastq:
275
- quality_threshold: 20
276
- min_length: 50
277
- blast:
278
- evalue: 0.001
279
- num_threads: 4
280
- ```
281
-
282
- ## ๐Ÿค Contributing
283
-
284
- Contributions are welcome! This project is open source under the MIT License.
285
-
286
- ### Development Setup
287
- ```bash
288
- python -m venv venv
289
- source venv/bin/activate # or venv\Scripts\activate on Windows
290
- pip install -r requirements.txt
291
- pytest test_cancer_at_home.py
292
- ```
293
-
294
- ## ๐Ÿ“„ License
295
-
296
- MIT License - See [LICENSE](LICENSE) file
297
-
298
- Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal
299
-
300
- ## ๐Ÿ™ Acknowledgments
301
-
302
- ### Inspiration
303
- - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge
304
- - [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4)
305
-
306
- ### Data Sources
307
- - [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/)
308
- - The Cancer Genome Atlas (TCGA) Program
309
- - Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
310
-
311
- ### Technologies
312
- - Neo4j Graph Database
313
- - BOINC Distributed Computing Project
314
- - Biopython Community
315
- - FastAPI Framework
316
-
317
- ## ๐Ÿ‘ฅ Authors
318
-
319
- - **OpenPeer AI** - Core development and architecture
320
- - **Riemann Computing Inc.** - Distributed computing integration
321
- - **Bleunomics** - Bioinformatics pipeline and genomics expertise
322
- - **Andrew Magdy Kamal** - Graph database design and visualization
323
-
324
- ## ๐Ÿ“ž Support
325
-
326
- - **Documentation**: See project documentation files
327
- - **Issues**: Check logs in `logs/cancer_at_home.log`
328
- - **Configuration**: Review `config.yml`
329
- - **Health Check**: http://localhost:5000/api/health
330
-
331
- ## ๐Ÿ”ฎ Roadmap
332
-
333
- ### Planned Features
334
- - Machine learning for mutation prediction
335
- - Multi-omics data integration (RNA-seq, proteomics)
336
- - Survival analysis and clinical outcomes
337
- - Advanced graph algorithms (PageRank, community detection)
338
- - Cloud deployment support (AWS, Azure, GCP)
339
- - Mobile-responsive design
340
- - User authentication and authorization
341
-
342
- ## ๐Ÿ“Š Statistics
343
-
344
- - **Lines of Code**: ~5,000+
345
- - **Modules**: 9 Python modules
346
- - **API Endpoints**: 15+ REST + GraphQL
347
- - **Documentation**: 2,500+ lines
348
- - **Setup Time**: < 5 minutes
349
- - **Sample Data**: 7 genes, 5 mutations, 5 patients
350
-
351
- ## ๐ŸŽฏ Citation
352
-
353
- If you use Cancer@Home v2 in your research, please cite:
354
-
355
- ```bibtex
356
- @software{cancer_at_home_v2,
357
- title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform},
358
- author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal},
359
- year = {2025},
360
- url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2},
361
- license = {MIT}
362
- }
363
- ```
364
-
365
- ## ๐Ÿท๏ธ Tags
366
-
367
- `cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology`
368
-
369
- ---
370
-
371
- **Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal**
372
-
373
- **For cancer research, by researchers, accessible to all.**
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - cancer-genomics
5
+ - bioinformatics
6
+ - graph-database
7
+ - neo4j
8
+ - distributed-computing
9
+ - boinc
10
+ - healthcare
11
+ - genomics
12
+ - fastq
13
+ - blast
14
+ - variant-calling
15
+ - gdc-portal
16
+ - tcga
17
+ library_name: cancer-at-home-v2
18
+ pipeline_tag: other
19
+ ---
20
+
21
+ # Cancer@Home v2
22
+
23
+ <div align="center">
24
+ <img src="https://img.shields.io/badge/version-2.0.0-blue.svg" alt="Version">
25
+ <img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License">
26
+ <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python">
27
+ <img src="https://img.shields.io/badge/neo4j-5.13-brightgreen.svg" alt="Neo4j">
28
+ </div>
29
+
30
+ ## ๐Ÿงฌ Overview
31
+
32
+ Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system.
33
+
34
+ Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual.
35
+
36
+ ## ๐ŸŽฏ Key Features
37
+
38
+ - ๐ŸŒ **Interactive Web Dashboard** - Modern UI with real-time visualizations
39
+ - ๐Ÿ” **Neo4j Graph Database** - Model complex gene-mutation-patient relationships
40
+ - โšก **BOINC Integration** - Distributed computing for intensive analyses
41
+ - ๐Ÿ“Š **GraphQL API** - Flexible data querying
42
+ - ๐Ÿงช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling
43
+ - ๐Ÿ“š **GDC Portal Integration** - Access TCGA/TARGET cancer datasets
44
+ - ๐Ÿš€ **Quick Setup** - Running in under 5 minutes
45
+
46
+ ## ๐Ÿ—๏ธ Architecture
47
+
48
+ ```
49
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
50
+ โ”‚ Web Dashboard (D3.js + Chart.js) โ”‚
51
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
52
+ โ”‚ FastAPI Backend (REST + GraphQL) โ”‚
53
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
54
+ โ”‚Neo4j โ”‚BOINC โ”‚ GDC โ”‚FASTQ โ”‚ BLAST/Variant โ”‚
55
+ โ”‚Graph โ”‚Clientโ”‚ API โ”‚ QC โ”‚ Calling โ”‚
56
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
57
+ ```
58
+
59
+ ## ๐Ÿ“ฆ Installation
60
+
61
+ ### Prerequisites
62
+ - Python 3.8+
63
+ - Docker Desktop
64
+ - 8GB RAM (16GB recommended)
65
+
66
+ ### Quick Start
67
+
68
+ **Windows:**
69
+ ```powershell
70
+ git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
71
+ cd CancerAtHomeV2
72
+ .\setup.ps1
73
+ python run.py
74
+ ```
75
+
76
+ **Linux/Mac:**
77
+ ```bash
78
+ git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
79
+ cd CancerAtHomeV2
80
+ chmod +x setup.sh
81
+ ./setup.sh
82
+ python run.py
83
+ ```
84
+
85
+ Then open: **http://localhost:5000**
86
+
87
+ ## ๐Ÿš€ Usage
88
+
89
+ ### Web Dashboard
90
+ Access the interactive dashboard at http://localhost:5000 with:
91
+ - **Dashboard Tab**: Overview statistics and mutation charts
92
+ - **Neo4j Visualization**: Interactive graph of cancer relationships
93
+ - **BOINC Tasks**: Submit and monitor distributed computing tasks
94
+ - **GDC Data**: Browse and download cancer datasets
95
+ - **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling
96
+
97
+ ### GraphQL API
98
+
99
+ Query cancer data at http://localhost:5000/graphql
100
+
101
+ **Example: Get mutations in TP53 gene**
102
+ ```graphql
103
+ query {
104
+ mutations(gene: "TP53") {
105
+ mutation_id
106
+ chromosome
107
+ position
108
+ consequence
109
+ }
110
+ }
111
+ ```
112
+
113
+ **Example: Get patient statistics**
114
+ ```graphql
115
+ query {
116
+ cancerStatistics(cancer_type_id: "BRCA") {
117
+ total_patients
118
+ total_mutations
119
+ avg_mutations_per_patient
120
+ }
121
+ }
122
+ ```
123
+
124
+ ### REST API
125
+
126
+ **Database Summary:**
127
+ ```bash
128
+ curl http://localhost:5000/api/neo4j/summary
129
+ ```
130
+
131
+ **Submit BOINC Task:**
132
+ ```bash
133
+ curl -X POST http://localhost:5000/api/boinc/submit \
134
+ -H "Content-Type: application/json" \
135
+ -d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}'
136
+ ```
137
+
138
+ ### Python API
139
+
140
+ **FASTQ Processing:**
141
+ ```python
142
+ from backend.pipeline import FASTQProcessor
143
+
144
+ processor = FASTQProcessor()
145
+ stats = processor.calculate_statistics("input.fastq")
146
+ filtered = processor.quality_filter("input.fastq")
147
+ ```
148
+
149
+ **Variant Calling:**
150
+ ```python
151
+ from backend.pipeline import VariantCaller, VariantAnalyzer
152
+
153
+ caller = VariantCaller()
154
+ vcf_file = caller.call_variants("alignment.bam", "reference.fa")
155
+ variants = caller.filter_variants(vcf_file)
156
+
157
+ analyzer = VariantAnalyzer()
158
+ cancer_variants = analyzer.identify_cancer_variants(variants)
159
+ tmb = analyzer.calculate_mutation_burden(variants)
160
+ ```
161
+
162
+ **Neo4j Queries:**
163
+ ```python
164
+ from backend.neo4j import DatabaseManager
165
+
166
+ db = DatabaseManager()
167
+ query = """
168
+ MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation)
169
+ RETURN m.position, m.consequence
170
+ """
171
+ results = db.execute_query(query)
172
+ db.close()
173
+ ```
174
+
175
+ ## ๐Ÿ“Š Data Model
176
+
177
+ ### Neo4j Graph Schema
178
+
179
+ **Nodes:**
180
+ - **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.)
181
+ - **Mutation**: Genetic variants with position and consequence
182
+ - **Patient**: Individual cases with demographics
183
+ - **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM)
184
+
185
+ **Relationships:**
186
+ - `Gene โ† AFFECTS โ† Mutation`
187
+ - `Patient โ†’ HAS_MUTATION โ†’ Mutation`
188
+ - `Patient โ†’ DIAGNOSED_WITH โ†’ CancerType`
189
+
190
+ ### Sample Data Included
191
+
192
+ - **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR
193
+ - **5 Mutations**: Cancer-associated variants
194
+ - **5 Patients**: Representative TCGA cases
195
+ - **4 Cancer Types**: BRCA, LUAD, COAD, GBM
196
+
197
+ ## ๐Ÿ”ง Technology Stack
198
+
199
+ - **Backend**: FastAPI, Python 3.8+
200
+ - **Database**: Neo4j 5.13 (Graph Database)
201
+ - **API**: GraphQL (Strawberry), REST
202
+ - **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js
203
+ - **Bioinformatics**: Biopython, BLAST+
204
+ - **Data Source**: GDC Portal API (TCGA/TARGET)
205
+ - **Infrastructure**: Docker, Docker Compose
206
+ - **Distributed Computing**: BOINC Framework
207
+
208
+ ## ๐Ÿ“š Documentation
209
+
210
+ - [README.md](README.md) - Complete project overview
211
+ - [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide
212
+ - [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation
213
+ - [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples
214
+ - [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
215
+ - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview
216
+
217
+ ## ๐ŸŽ“ Use Cases
218
+
219
+ 1. **Cancer Research**: Analyze genomics data with distributed computing
220
+ 2. **Education**: Learn cancer genetics and bioinformatics
221
+ 3. **Data Visualization**: Explore gene-mutation-patient relationships
222
+ 4. **Pipeline Development**: Test bioinformatics workflows
223
+ 5. **Graph Analytics**: Query complex biological networks
224
+
225
+ ## ๐Ÿ”ฌ Supported Cancer Projects
226
+
227
+ - **TCGA-BRCA**: Breast Cancer (1,098 cases)
228
+ - **TCGA-LUAD**: Lung Adenocarcinoma (585 cases)
229
+ - **TCGA-COAD**: Colon Adenocarcinoma (461 cases)
230
+ - **TCGA-GBM**: Glioblastoma (617 cases)
231
+ - **TARGET-AML**: Acute Myeloid Leukemia (238 cases)
232
+
233
+ ## ๐Ÿ“ˆ Bioinformatics Pipeline
234
+
235
+ ### FASTQ Processing
236
+ - Quality control and filtering
237
+ - Adapter trimming
238
+ - Statistics calculation
239
+ - QC report generation
240
+
241
+ ### BLAST Alignment
242
+ - BLASTN for nucleotide sequences
243
+ - BLASTP for protein sequences
244
+ - Hit filtering by identity/e-value
245
+ - Homology detection
246
+
247
+ ### Variant Calling
248
+ - VCF generation from alignments
249
+ - Quality filtering
250
+ - Cancer variant identification
251
+ - Tumor mutation burden (TMB) calculation
252
+
253
+ ## ๐ŸŒ Access Points
254
+
255
+ - **Application**: http://localhost:5000
256
+ - **API Docs**: http://localhost:5000/docs (Swagger UI)
257
+ - **GraphQL**: http://localhost:5000/graphql
258
+ - **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123)
259
+
260
+ ## ๐Ÿ› ๏ธ Configuration
261
+
262
+ Edit `config.yml` to customize:
263
+
264
+ ```yaml
265
+ neo4j:
266
+ uri: "bolt://localhost:7687"
267
+ password: "cancer123"
268
+
269
+ gdc:
270
+ download_dir: "./data/gdc"
271
+ projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"]
272
+
273
+ pipeline:
274
+ fastq:
275
+ quality_threshold: 20
276
+ min_length: 50
277
+ blast:
278
+ evalue: 0.001
279
+ num_threads: 4
280
+ ```
281
+
282
+ ## ๐Ÿค Contributing
283
+
284
+ Contributions are welcome! This project is open source under the MIT License.
285
+
286
+ ### Development Setup
287
+ ```bash
288
+ python -m venv venv
289
+ source venv/bin/activate # or venv\Scripts\activate on Windows
290
+ pip install -r requirements.txt
291
+ pytest test_cancer_at_home.py
292
+ ```
293
+
294
+ ## ๐Ÿ“„ License
295
+
296
+ MIT License - See [LICENSE](LICENSE) file
297
+
298
+ Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal
299
+
300
+ ## ๐Ÿ™ Acknowledgments
301
+
302
+ ### Inspiration
303
+ - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge
304
+ - [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4)
305
+
306
+ ### Data Sources
307
+ - [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/)
308
+ - The Cancer Genome Atlas (TCGA) Program
309
+ - Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
310
+
311
+ ### Technologies
312
+ - Neo4j Graph Database
313
+ - BOINC Distributed Computing Project
314
+ - Biopython Community
315
+ - FastAPI Framework
316
+
317
+ ## ๐Ÿ‘ฅ Authors
318
+
319
+ - **OpenPeer AI** - Core development and architecture
320
+ - **Riemann Computing Inc.** - Distributed computing integration
321
+ - **Bleunomics** - Bioinformatics pipeline and genomics expertise
322
+ - **Andrew Magdy Kamal** - Graph database design and visualization
323
+
324
+ ## ๐Ÿ“ž Support
325
+
326
+ - **Documentation**: See project documentation files
327
+ - **Issues**: Check logs in `logs/cancer_at_home.log`
328
+ - **Configuration**: Review `config.yml`
329
+ - **Health Check**: http://localhost:5000/api/health
330
+
331
+ ## ๐Ÿ”ฎ Roadmap
332
+
333
+ ### Planned Features
334
+ - Machine learning for mutation prediction
335
+ - Multi-omics data integration (RNA-seq, proteomics)
336
+ - Survival analysis and clinical outcomes
337
+ - Advanced graph algorithms (PageRank, community detection)
338
+ - Cloud deployment support (AWS, Azure, GCP)
339
+ - Mobile-responsive design
340
+ - User authentication and authorization
341
+
342
+ ## ๐Ÿ“Š Statistics
343
+
344
+ - **Lines of Code**: ~5,000+
345
+ - **Modules**: 9 Python modules
346
+ - **API Endpoints**: 15+ REST + GraphQL
347
+ - **Documentation**: 2,500+ lines
348
+ - **Setup Time**: < 5 minutes
349
+ - **Sample Data**: 7 genes, 5 mutations, 5 patients
350
+
351
+ ## ๐ŸŽฏ Citation
352
+
353
+ If you use Cancer@Home v2 in your research, please cite:
354
+
355
+ ```bibtex
356
+ @software{cancer_at_home_v2,
357
+ title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform},
358
+ author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal},
359
+ year = {2025},
360
+ url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2},
361
+ license = {MIT}
362
+ }
363
+ ```
364
+
365
+ ## ๐Ÿท๏ธ Tags
366
+
367
+ `cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology`
368
+
369
+ ---
370
+
371
+ **Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal**
372
+
373
+ **For cancer research, by researchers, accessible to all.**