Spaces:
Running
Running
Yang Chung
commited on
Commit
Β·
5fa85d7
1
Parent(s):
97b35c5
Update with illicit numbers
Browse files- README.md +9 -11
- index.html +13 -15
README.md
CHANGED
|
@@ -28,36 +28,36 @@ Comprehensive evaluation datasets for testing AI model safety mechanisms
|
|
| 28 |
|
| 29 |
| Metric | Value |
|
| 30 |
|--------|-------|
|
| 31 |
-
| **Total Conversations** |
|
| 32 |
-
| **Total Turns** |
|
| 33 |
| **Dataset Types** | 3 complementary methodologies |
|
| 34 |
-
| **Sample Data Available** |
|
| 35 |
|
| 36 |
## π Full Dataset Statistics
|
| 37 |
|
| 38 |
| Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
|
| 39 |
|---------|--------------|-------|----------------|--------|
|
| 40 |
-
| **Psychology multi-turn** |
|
| 41 |
-
| **Illicit (bioweapon) multi-turn** |
|
| 42 |
-
| **Illicit (chemical, general) multi-turn** |
|
| 43 |
|
| 44 |
## π Access Datasets on Hugging Face
|
| 45 |
|
| 46 |
### Psychology Multi-turn Conversations
|
| 47 |
Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
|
| 48 |
-
**Sample:**
|
| 49 |
|
| 50 |
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations)**
|
| 51 |
|
| 52 |
### Illicit (bioweapon) Multi-turn Conversations
|
| 53 |
Bio-safety harmfulness such as bioweapons, pathogens, etc.
|
| 54 |
-
**Sample:**
|
| 55 |
|
| 56 |
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1)**
|
| 57 |
|
| 58 |
### Illicit (chemical, general) Multi-turn Conversations
|
| 59 |
Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
|
| 60 |
-
**Sample:**
|
| 61 |
|
| 62 |
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2)**
|
| 63 |
|
|
@@ -83,13 +83,11 @@ All datasets are derived from high-quality, validated conversations with strict
|
|
| 83 |
|
| 84 |
### Base Criteria
|
| 85 |
- Text-based objectives (no code execution templates)
|
| 86 |
-
- NeurIPS evaluation metadata present
|
| 87 |
- Verdict: `success` (harmful requests successfully fulfilled)
|
| 88 |
- Multi-turn conversations with prompt-response pairs
|
| 89 |
|
| 90 |
### Psychology-Specific Criteria
|
| 91 |
- Organic conversations (`organicity = true`)
|
| 92 |
-
- No disclaimer in responses
|
| 93 |
- Successfully elicited harmful psychology-related content
|
| 94 |
|
| 95 |
### Illicit-Specific Criteria
|
|
|
|
| 28 |
|
| 29 |
| Metric | Value |
|
| 30 |
|--------|-------|
|
| 31 |
+
| **Total Conversations** | 849+ |
|
| 32 |
+
| **Total Turns** | 6,694+ |
|
| 33 |
| **Dataset Types** | 3 complementary methodologies |
|
| 34 |
+
| **Sample Data Available** | 15 conversations |
|
| 35 |
|
| 36 |
## π Full Dataset Statistics
|
| 37 |
|
| 38 |
| Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
|
| 39 |
|---------|--------------|-------|----------------|--------|
|
| 40 |
+
| **Psychology multi-turn** | 184+ | 1,964+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. |
|
| 41 |
+
| **Illicit (bioweapon) multi-turn** | 84+ | 822+ | 9.8 | Bio-safety harmfulness such as bioweapons, pathogens, etc. |
|
| 42 |
+
| **Illicit (chemical, general) multi-turn** | 581+ | 3,908+ | 6.7 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. |
|
| 43 |
|
| 44 |
## π Access Datasets on Hugging Face
|
| 45 |
|
| 46 |
### Psychology Multi-turn Conversations
|
| 47 |
Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
|
| 48 |
+
**Sample:** 5 conversations
|
| 49 |
|
| 50 |
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations)**
|
| 51 |
|
| 52 |
### Illicit (bioweapon) Multi-turn Conversations
|
| 53 |
Bio-safety harmfulness such as bioweapons, pathogens, etc.
|
| 54 |
+
**Sample:** 5 conversations
|
| 55 |
|
| 56 |
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1)**
|
| 57 |
|
| 58 |
### Illicit (chemical, general) Multi-turn Conversations
|
| 59 |
Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
|
| 60 |
+
**Sample:** 5 conversations
|
| 61 |
|
| 62 |
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2)**
|
| 63 |
|
|
|
|
| 83 |
|
| 84 |
### Base Criteria
|
| 85 |
- Text-based objectives (no code execution templates)
|
|
|
|
| 86 |
- Verdict: `success` (harmful requests successfully fulfilled)
|
| 87 |
- Multi-turn conversations with prompt-response pairs
|
| 88 |
|
| 89 |
### Psychology-Specific Criteria
|
| 90 |
- Organic conversations (`organicity = true`)
|
|
|
|
| 91 |
- Successfully elicited harmful psychology-related content
|
| 92 |
|
| 93 |
### Illicit-Specific Criteria
|
index.html
CHANGED
|
@@ -243,12 +243,12 @@
|
|
| 243 |
<div class="stats-grid">
|
| 244 |
<div class="stat-card">
|
| 245 |
<h4>Total Conversations</h4>
|
| 246 |
-
<div class="number">
|
| 247 |
<div class="label">Across all datasets</div>
|
| 248 |
</div>
|
| 249 |
<div class="stat-card">
|
| 250 |
<h4>Total Turns</h4>
|
| 251 |
-
<div class="number">
|
| 252 |
<div class="label">Multi-turn interactions</div>
|
| 253 |
</div>
|
| 254 |
<div class="stat-card">
|
|
@@ -280,23 +280,23 @@
|
|
| 280 |
<tbody>
|
| 281 |
<tr>
|
| 282 |
<td><strong>Psychology multi-turn</strong></td>
|
| 283 |
-
<td>
|
| 284 |
-
<td>
|
| 285 |
<td>10.3</td>
|
| 286 |
<td>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.</td>
|
| 287 |
</tr>
|
| 288 |
<tr>
|
| 289 |
<td><strong>Illicit (bioweapon) multi-turn</strong></td>
|
| 290 |
-
<td>
|
| 291 |
-
<td>
|
| 292 |
-
<td>
|
| 293 |
<td>Bio-safety harmfulness such as bioweapons, pathogens, etc.</td>
|
| 294 |
</tr>
|
| 295 |
<tr>
|
| 296 |
<td><strong>Illicit (chemical, general) multi-turn</strong></td>
|
| 297 |
-
<td>
|
| 298 |
-
<td>
|
| 299 |
-
<td>6.
|
| 300 |
<td>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.</td>
|
| 301 |
</tr>
|
| 302 |
</tbody>
|
|
@@ -310,19 +310,19 @@
|
|
| 310 |
<div class="dataset-card">
|
| 311 |
<h4>Psychology Multi-turn Conversations</h4>
|
| 312 |
<p>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.<br>
|
| 313 |
-
<strong>Sample:</strong>
|
| 314 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations" class="btn" target="_blank">View Dataset β</a>
|
| 315 |
</div>
|
| 316 |
<div class="dataset-card">
|
| 317 |
<h4>Illicit (bioweapon) Multi-turn Conversations</h4>
|
| 318 |
<p>Bio-safety harmfulness such as bioweapons, pathogens, etc.<br>
|
| 319 |
-
<strong>Sample:</strong>
|
| 320 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1" class="btn" target="_blank">View Dataset β</a>
|
| 321 |
</div>
|
| 322 |
<div class="dataset-card">
|
| 323 |
<h4>Illicit (chemical, general) Multi-turn Conversations</h4>
|
| 324 |
<p>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.<br>
|
| 325 |
-
<strong>Sample:</strong>
|
| 326 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2" class="btn" target="_blank">View Dataset β</a>
|
| 327 |
</div>
|
| 328 |
</div>
|
|
@@ -363,7 +363,6 @@
|
|
| 363 |
<h3>Base Criteria</h3>
|
| 364 |
<ul>
|
| 365 |
<li>Text-based objectives (no code execution templates)</li>
|
| 366 |
-
<li>NeurIPS evaluation metadata present</li>
|
| 367 |
<li>Verdict: <code>success</code> (harmful requests successfully fulfilled)</li>
|
| 368 |
<li>Multi-turn conversations with prompt-response pairs</li>
|
| 369 |
</ul>
|
|
@@ -371,7 +370,6 @@
|
|
| 371 |
<h3>Psychology-Specific Criteria</h3>
|
| 372 |
<ul>
|
| 373 |
<li>Organic conversations (<code>organicity = true</code>)</li>
|
| 374 |
-
<li>No disclaimer in responses</li>
|
| 375 |
<li>Successfully elicited harmful psychology-related content</li>
|
| 376 |
</ul>
|
| 377 |
|
|
|
|
| 243 |
<div class="stats-grid">
|
| 244 |
<div class="stat-card">
|
| 245 |
<h4>Total Conversations</h4>
|
| 246 |
+
<div class="number">849+</div>
|
| 247 |
<div class="label">Across all datasets</div>
|
| 248 |
</div>
|
| 249 |
<div class="stat-card">
|
| 250 |
<h4>Total Turns</h4>
|
| 251 |
+
<div class="number">6694+</div>
|
| 252 |
<div class="label">Multi-turn interactions</div>
|
| 253 |
</div>
|
| 254 |
<div class="stat-card">
|
|
|
|
| 280 |
<tbody>
|
| 281 |
<tr>
|
| 282 |
<td><strong>Psychology multi-turn</strong></td>
|
| 283 |
+
<td>184+</td>
|
| 284 |
+
<td>1964+</td>
|
| 285 |
<td>10.3</td>
|
| 286 |
<td>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.</td>
|
| 287 |
</tr>
|
| 288 |
<tr>
|
| 289 |
<td><strong>Illicit (bioweapon) multi-turn</strong></td>
|
| 290 |
+
<td>84+</td>
|
| 291 |
+
<td>822+</td>
|
| 292 |
+
<td>9.8</td>
|
| 293 |
<td>Bio-safety harmfulness such as bioweapons, pathogens, etc.</td>
|
| 294 |
</tr>
|
| 295 |
<tr>
|
| 296 |
<td><strong>Illicit (chemical, general) multi-turn</strong></td>
|
| 297 |
+
<td>581+</td>
|
| 298 |
+
<td>3908+</td>
|
| 299 |
+
<td>6.7</td>
|
| 300 |
<td>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.</td>
|
| 301 |
</tr>
|
| 302 |
</tbody>
|
|
|
|
| 310 |
<div class="dataset-card">
|
| 311 |
<h4>Psychology Multi-turn Conversations</h4>
|
| 312 |
<p>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.<br>
|
| 313 |
+
<strong>Sample:</strong> 5 conversations</p>
|
| 314 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations" class="btn" target="_blank">View Dataset β</a>
|
| 315 |
</div>
|
| 316 |
<div class="dataset-card">
|
| 317 |
<h4>Illicit (bioweapon) Multi-turn Conversations</h4>
|
| 318 |
<p>Bio-safety harmfulness such as bioweapons, pathogens, etc.<br>
|
| 319 |
+
<strong>Sample:</strong> 5 conversations</p>
|
| 320 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1" class="btn" target="_blank">View Dataset β</a>
|
| 321 |
</div>
|
| 322 |
<div class="dataset-card">
|
| 323 |
<h4>Illicit (chemical, general) Multi-turn Conversations</h4>
|
| 324 |
<p>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.<br>
|
| 325 |
+
<strong>Sample:</strong> 5 conversations</p>
|
| 326 |
<a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2" class="btn" target="_blank">View Dataset β</a>
|
| 327 |
</div>
|
| 328 |
</div>
|
|
|
|
| 363 |
<h3>Base Criteria</h3>
|
| 364 |
<ul>
|
| 365 |
<li>Text-based objectives (no code execution templates)</li>
|
|
|
|
| 366 |
<li>Verdict: <code>success</code> (harmful requests successfully fulfilled)</li>
|
| 367 |
<li>Multi-turn conversations with prompt-response pairs</li>
|
| 368 |
</ul>
|
|
|
|
| 370 |
<h3>Psychology-Specific Criteria</h3>
|
| 371 |
<ul>
|
| 372 |
<li>Organic conversations (<code>organicity = true</code>)</li>
|
|
|
|
| 373 |
<li>Successfully elicited harmful psychology-related content</li>
|
| 374 |
</ul>
|
| 375 |
|