Mentors4EDU commited on
Commit
949080e
Β·
verified Β·
1 Parent(s): 7a92197

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +181 -172
README.md CHANGED
@@ -1,172 +1,181 @@
1
- # Cancer@Home v2
2
-
3
- A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.
4
-
5
- ## πŸš€ Quick Start (5 minutes)
6
-
7
- ### Prerequisites
8
- - Python 3.8+
9
- - Docker Desktop
10
- - 8GB RAM minimum
11
-
12
- ### Installation
13
-
14
- 1. **Clone and setup**
15
- ```bash
16
- cd CancerAtHome2
17
- python -m venv venv
18
- venv\Scripts\activate # Windows
19
- pip install -r requirements.txt
20
- ```
21
-
22
- 2. **Start Neo4j Database**
23
- ```bash
24
- docker-compose up -d
25
- ```
26
-
27
- 3. **Run the application**
28
- ```bash
29
- python run.py
30
- ```
31
-
32
- 4. **Open your browser**
33
- - Application: http://localhost:5000
34
- - Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)
35
-
36
- ## 🎯 Features
37
-
38
- ### 1. **Distributed Computing (BOINC Integration)**
39
- - Submit cancer research computational tasks
40
- - Monitor distributed workload processing
41
- - Real-time task status tracking
42
-
43
- ### 2. **GDC Data Integration**
44
- - Download cancer genomics data from GDC Portal
45
- - Support for various cancer types (TCGA, TARGET projects)
46
- - Automatic data parsing and normalization
47
-
48
- ### 3. **Sequence Analysis Pipeline**
49
- - FASTQ file processing
50
- - BLAST sequence alignment
51
- - Variant calling and annotation
52
-
53
- ### 4. **Neo4j Graph Database**
54
- - Graph-based cancer data modeling
55
- - Relationships: Gene β†’ Mutation β†’ Patient β†’ Cancer Type
56
- - Interactive graph visualization
57
-
58
- ### 5. **GraphQL API**
59
- - Query cancer data flexibly
60
- - Filter by gene, mutation, patient cohort
61
- - Aggregate statistics
62
-
63
- ### 6. **Interactive Dashboard**
64
- - Real-time data visualization
65
- - Network graphs for gene interactions
66
- - Mutation frequency charts
67
- - Patient cohort analysis
68
-
69
- ## πŸ“Š Architecture
70
-
71
- ```
72
- Cancer@Home v2
73
- β”‚
74
- β”œβ”€β”€ Frontend (React + D3.js)
75
- β”‚ β”œβ”€β”€ Dashboard
76
- β”‚ β”œβ”€β”€ Neo4j Visualization
77
- β”‚ └── Task Monitor
78
- β”‚
79
- β”œβ”€β”€ Backend (FastAPI)
80
- β”‚ β”œβ”€β”€ REST API
81
- β”‚ β”œβ”€β”€ GraphQL Endpoint
82
- β”‚ └── WebSocket (real-time updates)
83
- β”‚
84
- β”œβ”€β”€ Data Layer
85
- β”‚ β”œβ”€β”€ Neo4j (Graph Database)
86
- β”‚ β”œβ”€β”€ BOINC Client
87
- β”‚ └── GDC API Client
88
- β”‚
89
- └── Analysis Pipeline
90
- β”œβ”€β”€ FASTQ Parser
91
- β”œβ”€β”€ BLAST Wrapper
92
- └── Variant Annotator
93
- ```
94
-
95
- ## πŸ—‚οΈ Project Structure
96
-
97
- ```
98
- CancerAtHome2/
99
- β”œβ”€β”€ backend/
100
- β”‚ β”œβ”€β”€ api/ # FastAPI routes
101
- β”‚ β”œβ”€β”€ boinc/ # BOINC integration
102
- β”‚ β”œβ”€β”€ gdc/ # GDC data fetcher
103
- β”‚ β”œβ”€β”€ neo4j/ # Neo4j database layer
104
- β”‚ β”œβ”€β”€ pipeline/ # Bioinformatics pipeline
105
- β”‚ └── graphql/ # GraphQL schema
106
- β”œβ”€β”€ frontend/
107
- β”‚ β”œβ”€β”€ public/
108
- β”‚ └── src/
109
- β”‚ β”œβ”€β”€ components/ # React components
110
- β”‚ β”œβ”€β”€ views/ # Page views
111
- β”‚ └── api/ # API client
112
- β”œβ”€β”€ data/ # Downloaded datasets
113
- β”œβ”€β”€ docker-compose.yml # Neo4j container
114
- β”œβ”€β”€ requirements.txt # Python dependencies
115
- └── run.py # Main entry point
116
- ```
117
-
118
- ## 🧬 Data Flow
119
-
120
- 1. **Data Ingestion**: Download cancer genomics data from GDC Portal
121
- 2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network
122
- 3. **Storage**: Store results in Neo4j graph database
123
- 4. **Visualization**: Query and visualize via web dashboard
124
-
125
- ## πŸ”§ Configuration
126
-
127
- Edit `config.yml` to customize:
128
- - Neo4j connection settings
129
- - GDC API parameters
130
- - BOINC project URL
131
- - Analysis pipeline options
132
-
133
- ## πŸ“– Usage Examples
134
-
135
- ### Query Mutations by Gene
136
- ```graphql
137
- query {
138
- mutations(gene: "TP53") {
139
- id
140
- position
141
- consequence
142
- patients {
143
- cancerType
144
- stage
145
- }
146
- }
147
- }
148
- ```
149
-
150
- ### Submit Analysis Task
151
- ```python
152
- from backend.boinc import BOINCClient
153
-
154
- client = BOINCClient()
155
- task_id = client.submit_task(
156
- workunit_type="variant_calling",
157
- input_file="sample.fastq"
158
- )
159
- ```
160
-
161
- ## 🀝 Inspired By
162
-
163
- - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research
164
- - [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling
165
-
166
- ## πŸ“„ License
167
-
168
- MIT License
169
-
170
- ## πŸ›Ÿ Support
171
-
172
- For issues or questions, please open a GitHub issue.
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ - bleu
8
+ - bleurt
9
+ ---
10
+ # Cancer@Home v2
11
+
12
+ A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.
13
+
14
+ ## πŸš€ Quick Start (5 minutes)
15
+
16
+ ### Prerequisites
17
+ - Python 3.8+
18
+ - Docker Desktop
19
+ - 8GB RAM minimum
20
+
21
+ ### Installation
22
+
23
+ 1. **Clone and setup**
24
+ ```bash
25
+ cd CancerAtHome2
26
+ python -m venv venv
27
+ venv\Scripts\activate # Windows
28
+ pip install -r requirements.txt
29
+ ```
30
+
31
+ 2. **Start Neo4j Database**
32
+ ```bash
33
+ docker-compose up -d
34
+ ```
35
+
36
+ 3. **Run the application**
37
+ ```bash
38
+ python run.py
39
+ ```
40
+
41
+ 4. **Open your browser**
42
+ - Application: http://localhost:5000
43
+ - Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)
44
+
45
+ ## 🎯 Features
46
+
47
+ ### 1. **Distributed Computing (BOINC Integration)**
48
+ - Submit cancer research computational tasks
49
+ - Monitor distributed workload processing
50
+ - Real-time task status tracking
51
+
52
+ ### 2. **GDC Data Integration**
53
+ - Download cancer genomics data from GDC Portal
54
+ - Support for various cancer types (TCGA, TARGET projects)
55
+ - Automatic data parsing and normalization
56
+
57
+ ### 3. **Sequence Analysis Pipeline**
58
+ - FASTQ file processing
59
+ - BLAST sequence alignment
60
+ - Variant calling and annotation
61
+
62
+ ### 4. **Neo4j Graph Database**
63
+ - Graph-based cancer data modeling
64
+ - Relationships: Gene β†’ Mutation β†’ Patient β†’ Cancer Type
65
+ - Interactive graph visualization
66
+
67
+ ### 5. **GraphQL API**
68
+ - Query cancer data flexibly
69
+ - Filter by gene, mutation, patient cohort
70
+ - Aggregate statistics
71
+
72
+ ### 6. **Interactive Dashboard**
73
+ - Real-time data visualization
74
+ - Network graphs for gene interactions
75
+ - Mutation frequency charts
76
+ - Patient cohort analysis
77
+
78
+ ## πŸ“Š Architecture
79
+
80
+ ```
81
+ Cancer@Home v2
82
+ β”‚
83
+ β”œβ”€β”€ Frontend (React + D3.js)
84
+ β”‚ β”œβ”€β”€ Dashboard
85
+ β”‚ β”œβ”€β”€ Neo4j Visualization
86
+ β”‚ └── Task Monitor
87
+ β”‚
88
+ β”œβ”€β”€ Backend (FastAPI)
89
+ β”‚ β”œβ”€β”€ REST API
90
+ β”‚ β”œβ”€β”€ GraphQL Endpoint
91
+ β”‚ └── WebSocket (real-time updates)
92
+ β”‚
93
+ β”œβ”€β”€ Data Layer
94
+ β”‚ β”œβ”€β”€ Neo4j (Graph Database)
95
+ β”‚ β”œβ”€β”€ BOINC Client
96
+ β”‚ └── GDC API Client
97
+ β”‚
98
+ └── Analysis Pipeline
99
+ β”œβ”€β”€ FASTQ Parser
100
+ β”œβ”€β”€ BLAST Wrapper
101
+ └── Variant Annotator
102
+ ```
103
+
104
+ ## πŸ—‚οΈ Project Structure
105
+
106
+ ```
107
+ CancerAtHome2/
108
+ β”œβ”€β”€ backend/
109
+ β”‚ β”œβ”€β”€ api/ # FastAPI routes
110
+ β”‚ β”œβ”€β”€ boinc/ # BOINC integration
111
+ β”‚ β”œβ”€β”€ gdc/ # GDC data fetcher
112
+ β”‚ β”œβ”€β”€ neo4j/ # Neo4j database layer
113
+ β”‚ β”œβ”€β”€ pipeline/ # Bioinformatics pipeline
114
+ β”‚ └── graphql/ # GraphQL schema
115
+ β”œβ”€β”€ frontend/
116
+ β”‚ β”œβ”€β”€ public/
117
+ β”‚ └── src/
118
+ β”‚ β”œβ”€β”€ components/ # React components
119
+ β”‚ β”œβ”€β”€ views/ # Page views
120
+ β”‚ └── api/ # API client
121
+ β”œβ”€β”€ data/ # Downloaded datasets
122
+ β”œβ”€β”€ docker-compose.yml # Neo4j container
123
+ β”œβ”€β”€ requirements.txt # Python dependencies
124
+ └── run.py # Main entry point
125
+ ```
126
+
127
+ ## 🧬 Data Flow
128
+
129
+ 1. **Data Ingestion**: Download cancer genomics data from GDC Portal
130
+ 2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network
131
+ 3. **Storage**: Store results in Neo4j graph database
132
+ 4. **Visualization**: Query and visualize via web dashboard
133
+
134
+ ## πŸ”§ Configuration
135
+
136
+ Edit `config.yml` to customize:
137
+ - Neo4j connection settings
138
+ - GDC API parameters
139
+ - BOINC project URL
140
+ - Analysis pipeline options
141
+
142
+ ## πŸ“– Usage Examples
143
+
144
+ ### Query Mutations by Gene
145
+ ```graphql
146
+ query {
147
+ mutations(gene: "TP53") {
148
+ id
149
+ position
150
+ consequence
151
+ patients {
152
+ cancerType
153
+ stage
154
+ }
155
+ }
156
+ }
157
+ ```
158
+
159
+ ### Submit Analysis Task
160
+ ```python
161
+ from backend.boinc import BOINCClient
162
+
163
+ client = BOINCClient()
164
+ task_id = client.submit_task(
165
+ workunit_type="variant_calling",
166
+ input_file="sample.fastq"
167
+ )
168
+ ```
169
+
170
+ ## 🀝 Inspired By
171
+
172
+ - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research
173
+ - [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling
174
+
175
+ ## πŸ“„ License
176
+
177
+ MIT License
178
+
179
+ ## πŸ›Ÿ Support
180
+
181
+ For issues or questions, please open a Huggingface or GitHub issue.